Understanding Random Perturbation in Economic Models
Understanding the Random Perturbation Term
The term u, often denoted as a random perturbation, captures the randomness present in the relationships between economic variables. Instead of using deterministic equations, introducing a stochastic term, ui, is more appropriate to represent economic reality. This term accounts for numerous small factors that globally affect the dependent variable, Y, but are not explicitly included in the equations. In essence, it represents the sum of all explanatory factors not included as regressors, xji. Additionally, it addresses measurement errors in variables, which can lead to imperfect relationships, and the aggregation of economic policy effects. These factors cause imperfect relationships between variables.
Distribution of Probabilities
Regarding the distribution of probabilities, u has a zero mean: E(u) = 0, for i = 1, …, n. This means that while the perturbation term, ui, can take both positive and negative values, its average is zero. It represents a multitude of small factors that, on average, cancel each other out. Failure to meet this assumption affects the estimation process. In matrix form, this hypothesis is represented as E(u) = 0.
ui is homoscedastic, meaning it has a constant variance. The variance of the perturbation terms for all observations is the same: var(u) = E(u2) = σ2, for i = 1, …, n. The dispersion of the perturbation term around its mean does not depend on the values of the explanatory variables. Intuitively, this means that the unobservable factors captured by u have the same variability across all observations. Violation of this assumption is called heteroscedasticity. Formally, it is represented as var(u) = σ2, for i = 1, …, n. This is a situation where the dispersion of the values of a variable, xji, depends on the values of that variable.
ui is not autocorrelated. The perturbation terms corresponding to different observations are not related. Formally: cov(ui, us) = E(uius) = 0, for i ≠ s, and i, s = 1, …, n. This means that the unobservable factors captured by the perturbation term for different observations are not related. Violation of this assumption is called autocorrelation, represented as cov(ui, us) = E(uius) ≠ 0, for i ≠ s, and i, s = 1, …, n.
Together, the assumptions of homoscedasticity and no autocorrelation imply that the variance-covariance matrix of u is scalar, specifically var(u) = E(uu’) = σ2 * I. Furthermore, ui follows a normal probability distribution. Considering that it has a zero mean and constant variance, we can write this hypothesis as ui ~ N(0, σ2). In matrix form, without autocorrelation, we can write u ~ N(0, σ2 * I). Given that ui is the sum of many small factors, it is reasonable to assume it follows a normal probability distribution.
Hypotheses for Estimator β̂ (MCO)
To obtain the estimator β̂ using Ordinary Least Squares (OLS), certain hypotheses must be met. Observing the expression for the OLS estimator, its existence depends on the existence of the matrix (X’X)-1. Therefore, a necessary hypothesis is that X’X is a non-singular matrix, meaning |X’X| ≠ 0. This implies that the rank of X is equal to the number of parameters in the model, p(X) = k. Thus, the matrix (X’X)-1 exists.
Definition of R2
The coefficient of determination, R2, is defined from the decomposition of the sum of squares. R2 = SCE / SCT or 1 – (SCR / SCT), where:
- SCE: Sum of squares explained by the model, Σ(yi – ȳ)2
- SCT: Total sum of squares, Σ(yi – ȳ)2
- SCR: Sum of squares of the residuals, Σei2
R2 can also be defined as R2 = s2ŷ / s2y = 1 – (s2e / s2y). The coefficient is dimensionless and takes values between 0 and 1. R2 = 0 when the model does not explain any of the variability in Y, and R2 = 1 when all the variability in Y is explained by X. The residual variance is zero, meaning there are no residuals. Therefore, the coefficient of determination has a dual interpretation: the proportion of the variance of Y explained by X, and a measure of the goodness of fit. R2 * 100 represents the percentage of the variance of Y explained by the model, i.e., by the independent variables.
Adjusted R2
The adjusted R2 is calculated as: 1 – (SCR / (n-k)) / (SCT / (n-1)) = 1 – ((n-1) / (n-k)) * (1 – R2). This coefficient is used to compare models with the same dependent variable and functional form. It penalizes the degrees of freedom for including additional independent variables. The adjusted R2 can be less than R2, and it can take values in the range [-∞, 1], with values closer to 1 being better.