Econometrics: Regression Analysis and the Classical Assumptions

Econometrics, literally “economic measurement,” is a branch of economics that attempts to quantify theoretical relationships. Regression analysis is only one of the techniques used in econometrics, but it is by far the most frequently used.

Major Uses of Econometrics

  • Description
  • Hypothesis testing
  • Forecasting

The specific econometric techniques employed may vary depending on the use of the research.

Regression Analysis and Causality

While regression analysis specifies that a dependent variable is a function of one or more independent variables, regression analysis alone cannot prove or even imply causality.

Stochastic Error Term

A stochastic error term must be added to all regression equations to account for variations in the dependent variable that are not explained completely by the independent variables. The components of this error term include:

  • Omitted or left-out variables
  • Measurement errors in the data
  • An underlying theoretical equation that has a different functional form (shape) than the regression equation
  • Purely random and unpredictable events

Estimated Regression Equation

An estimated regression equation is an approximation of the true equation that is obtained by using data from a sample of actual Ys and Xs. Since we can never know the true equation, econometric analysis focuses on this estimated regression equation and the estimates of the regression coefficients. The difference between a particular observation of the dependent variable and the value estimated from the regression equation is called the residual.

Ordinary Least Squares (OLS)

Ordinary Least Squares (OLS) is the most frequently used method of obtaining estimates of the regression coefficients from a set of data. OLS chooses those that minimize the summed squared residuals for a particular sample.

R-squared

R-squared measures the percentage of the variation of Y around its mean that has been explained by a particular regression equation, adjusted for degrees of freedom. It increases when a variable is added to an equation only if the improvement in fit caused by the addition of the new variable more than offsets the loss of the degree of freedom that is used up in estimating the coefficient of the new variable. As a result, most researchers will automatically use it when evaluating the fit of their estimated regression equations.

The Seven Classical Assumptions

The seven Classical Assumptions state that the regression model is linear with an additive error term that has a mean of zero, is uncorrelated with the explanatory variables and other observations of the error term, has a constant variance, and is normally distributed (optional).

In addition, explanatory variables must not be perfect linear functions of each other.

Properties of an Estimator

The two most important properties of an estimator are unbiasedness and minimum variance. An estimator is unbiased when the expected value of the estimated coefficient is equal to the true value. Minimum variance holds when the estimating distribution has the smallest variance of all the estimators in a given class of estimators (for example, unbiased estimators).

Gauss-Markov Theorem and BLUE

Given the Classical Assumptions, OLS can be shown to be the minimum variance, linear, unbiased estimator (or BLUE, for Best Linear Unbiased Estimator) of the regression coefficients. This is the Gauss–Markov Theorem. When one or more of the classical properties do not hold (excluding normality), OLS is no longer BLUE, although it still may provide better estimates in some cases than the alternative estimation techniques discussed in subsequent chapters.

Sampling Distribution of the OLS Estimator

Because the sampling distribution of the OLS estimator is BLUE, it has desirable properties. Moreover, the variance, or the measure of dispersion of the sampling distribution, decreases as the number of observations increases.