Econometric Data Analysis: Types, Models, and STATA Commands

Econometric Data Types

  • Cross-sectional: Samples at a given point in time or current period.
  • Time series: Variables over time; the ordering of observations is important.
  • Pooled cross-sectional: Two or more cross-sections are combined, e.g., comparing variables across two different years. Used to evaluate policy changes.
  • Panel/Longitudinal data: The same cross-sectional units are followed over time. Can be used to account for time-invariant unobservables. E.g., each city has two observations in two different years.

Panel data consists of data on the same cross-sectional units over a given period of time, while pooled data consists of data on different cross-sectional units over a given period of time.

Econometric vs. Economic Models

Unlike an economic model, an econometric model applies statistical methods to economic data, primarily to validate economic models and estimate relationships utilizing complex statistical techniques.

Residuals and Error Terms

A residual represents the difference between the predicted value from the sample regression function and the actual observed results. In contrast, the error term in population regression represents the difference between the theoretical value predicted by the population regression model and the actual observed value.

Correlation vs. Coefficient

The relationship between correlation and coefficient is inverse due to the equation Cov(x,u)/Var(x). When correlation is high, variance is small due to x and y being strongly related. High variance can reduce the precision of coefficient estimates.

Regression Analysis Tables

Three key tables from regression analysis:

  1. ANOVA
  2. Overall model fit
  3. Parameter estimates

ANOVA

ANOVA provides:

  • Explained variation by the model (variation due to predictor).
  • Unexplained variation (residual) that the model cannot explain.

R2

R2 can be calculated using SS sections: SEE/SST or Model SS/SST. If R2 is small, the model can still work; the relationship is still statistically significant, but there may be more variation needed.

F Statistic

The F statistic can tell us if the model is working or not. If the F statistic is large, we can reject the null hypothesis, indicating a better model.

Overall Model Fit

Overall model fit tells us how well the model fits the data.

Coefficient

The coefficient tells us the direction and magnitude (direction does not matter today).

Standard Error

A lower standard error is more reliable because a higher standard error means there is more variance.

Omitted Variable Bias

Omitted variable bias occurs when a relevant variable that influences the dependent variable and an independent variable is left out.

Assumptions of OLS Estimators

Four assumptions for unbiasedness of OLS estimators:

  1. The relationship should be linear (x and y have a linear relationship; if x is squared, it is not linear).
  2. Random sample/sampling.
  3. X values vary.
  4. Zero conditional mean assumption; if you need additional independent variables, this cannot be met.

Assumption 5 is homoskedasticity, which is not needed to establish unbiasedness of OLS estimates. The variance of the error term is constant.

Assumptions of Multiple Regression

Assumptions are the same as above, but add in MLR 6, the normality assumption: U~N(0,sigma2).

As the number of observations increases, variation will decrease.

We do not need to know B1 to prove the unbiasedness of B1hat; we do not need the true number.

STATA Commands

Basic STATA commands:

  • cd: Change/check working directory.
  • dir: List the names of files in the directory.
  • doedit: Open do-file editor.
  • use: Load dataset.
  • save: Save dataset.
  • clear: Clear dataset from memory.
  • br: Open spreadsheet of data.
  • list: Print data to the STATA console.
  • in: Select by observation #.
  • if: Select by condition.
  • *list wage for last three observations: li wage in -3/L
  • *list wage educ if female: li wage educ if female ==1

Operators:

  • ==: Equal to
  • <: Less than
  • >: Greater than
  • !: Not
  • !=: Not equal
  • &: And
  • |: Or

Summary Commands:

  • summarize: Summarize distribution (mean, standard deviation, min, and max).
  • codebook: Inspect variable values (number of unique or missing values, ranges, quantiles, means, percentiles).
  • tabulate: Tabulate frequencies (displays counts for each variable).