Econometric Modeling and Statistical Analysis in Economics

Posted on Jan 11, 2025 in Mathematics

Properties of the Error Term in OLS

In Ordinary Least Squares (OLS) regression, two key properties of the error term are:

Zero Conditional Mean: The expected value of the error term (u) is zero, given any value of the explanatory variable(s). While we cannot predict the specific value of the error for a single observation, we assume its probability distribution averages to zero.
Homoscedasticity: The variance of the error term is constant for all values of the explanatory variable(s). This means the spread of the distribution of the dependent variable (Y) around its conditional expectation is the same across all values of the independent variable (x).

Stages of Econometric Modeling

Econometric modeling typically involves these stages:

Determination of a research goal.
Specification of explanatory variables.
Choice of functional relationships.
Estimation of structural parameters and interpretation of results.
Model quality verification.
Practical application of the model.

Regarding the third stage, there are three main approaches:

Source Approach: Based on the dynamic features of the phenomenon being analyzed.
Resultative Approach: Suitable only within the sample range for a single two-variable equation.
Empirical Data Approach: Plotting raw data on two-dimensional scatter diagrams to identify linear or non-linear relationships.

Spatial vs. Time Series Data

Spatial Data: Involves data collected from multiple entities (e.g., companies) at a specific point in time or over a period. For example, examining data from several companies over three years.
Time Series Data: Involves data collected from a single entity over a specific period. For example, analyzing data from one company from 2020 to 2022.

Econometric Modeling vs. Building an Econometric Model

Yes, there is a difference:

Econometric Modeling: A broader process encompassing five steps:
1. Determination of the research goal.
2. Specification of explanatory variables (especially for time/spatial dependency models).
3. Choice of a functional relationship.
4. Model quality verification.
5. Practical application of the model.
Building an Econometric Model: Focuses on the model’s construction and includes four steps, similar to modeling but excluding the practical application:
1. Determination of the research goal.
2. Choice of a functional relationship.
3. Estimation of structural parameters and interpretation of results.
4. Model quality verification.

Selection vs. Choice of Explanatory Variables

Selection: Based on economic theory, expert opinions, similar research findings, or personal experience.
Choice: Involves using statistical procedures, such as Hellwig’s method, to select the ‘k’ most relevant explanatory variables. Variables included in the model should exhibit a strong correlation with the dependent variable and a weak correlation among themselves.

Hellwig’s Method

Hellwig’s method assumes a linear relationship between variables. It prioritizes variables highly correlated with the dependent variable and weakly correlated with each other. The steps are:

Determine the vector of correlation coefficients between potential explanatory variables and the dependent variable, and the matrix of correlation coefficients between the explanatory variables.
Determine ‘L’ subsets of potential explanatory variables.
Calculate an individual information capacity indicator for each explanatory variable.
Calculate the integral information capacity indicators for all ‘L’ subsets.
Choose the best subset of explanatory variables, with the value ideally between 0 and 1.

Methods of Functional Relationship Choice

There are three primary methods:

Source Approach: Based on the dynamic features of the phenomenon. Functional forms are often derived from solving differential equations, often leading to exponential functions (e.g., Y = b*a^x, where b > 0).
Resultative Approach
Mixed Approach

The Idea of OLS

Ordinary Least Squares (OLS) is a method for estimating unknown parameters in a linear regression model. Its goal is to minimize the sum of the squared differences between the observed values and the values predicted by the linear model.

Gauss-Markov Theorem

The Gauss-Markov Theorem states that in a linear regression model where errors have an expectation of zero, are uncorrelated, and have equal variances, the Best Linear Unbiased Estimator (BLUE) is given by the OLS estimator.

Formula Elements of the OLS Estimator (Matrix Form)

Y: Vector of values of the dependent variable.
X: Matrix of values of the independent variables.
Beta: Vector of structural parameters.
U: Vector of error terms.
Beta Hat: Estimator of the vector of structural parameters.

Model Verification: Measures and Tests

Goodness of fit.
Residual variance estimation.
Inference.
Testing of serial correlation.
Testing and consequences of heteroskedasticity.
Testing multiple linear restrictions (the F-test).

Residual Variance Estimation

This involves calculating the standard error of estimation, which estimates the standard deviation of the unobservable factors affecting the dependent variable (Y). It is expressed in the same units as Y.

Coefficient of Determination (R-squared)

R-squared (R²) indicates the proportion of the sample variation in the dependent variable (Y) that is explained by the independent variable(s) (X). For forecasting, R² should ideally be above 90%. For descriptive purposes, an acceptable level is lower but generally not less than 60-70%.

Interpretation of Linear Model Parameter Estimates

Intercept (Beta Zero): The predicted value of Y when X is zero. This often has no practical meaning in many models.
Slope (Beta One): Represents the change in the estimated value of Y for a one-unit increase in X.
Standard Errors: Indicate the variability in the coefficient estimates.
T and P-values: Used for hypothesis testing of each coefficient.
R-squared: Measures the proportion of variability in the dependent variable explained by the model.

Statistical Significance of Structural Parameters

If the absolute value of the T-statistic is greater than or equal to the critical value from the T-distribution, we reject the null hypothesis in favor of the alternative hypothesis at a significance level (alpha). Alpha represents the probability of rejecting the null hypothesis when it is true. This indicates that our sample is statistically significant at the alpha level.

Autocorrelation of Residuals

Autocorrelation occurs when residuals in period ‘t’ are correlated with residuals in period ‘t-1’. We compare the Durbin-Watson (DW) statistic with critical values (dL and dU):

If DW < dL, reject the null hypothesis (H0) of no autocorrelation.
If DW > dU, fail to reject H0.
If dL < DW < dU, the test is inconclusive.

Reasons for Autocorrelation of Residuals

Omitted Variables: Important variables not included in the model.
Misspecification: Incorrect functional form (e.g., assuming a linear relationship when it’s quadratic).
Systematic Errors in Measurement: Consistent errors in data collection or recording.

Effects of Autocorrelation of Residuals

OLS estimators remain unbiased and consistent.
OLS estimators become inefficient.
Estimated variances of regression coefficients are biased and inconsistent, invalidating hypothesis testing.
R-squared may be overestimated, and t-statistics may be inflated.

Heterogeneity of the Disturbance Term

A larger standard error indicates a wider distribution of the unobservable factors affecting Y. When the variance of the error term depends on X, it exhibits heteroskedasticity (non-constant variance). If the values of ‘u’ are independently distributed with a mean of zero and constant variance, it implies a normal distribution.

Multicollinearity of Independent Variables

Imperfect multicollinearity refers to a strong linear relationship between two or more independent variables, significantly affecting the estimation of their coefficients.

Transforming Cobb-Douglas into a Group Productivity Function

The Cobb-Douglas production function relates total output to ‘N’ factors of production (e.g., employment, assets). For example:

Q = alpha0 * E^alpha1 * A^alpha2

Where:

Q = Output
E = Number of employees
A = Assets
alpha0 = Total factor productivity
alpha1, alpha2 = Output elasticities of labor and assets

Dividing by E (number of employees), we obtain the productivity function, representing output per employee.