Econometrics Fundamentals: OLS, Modeling, and Key Concepts

Properties of the Error Term in OLS

The Ordinary Least Squares (OLS) method relies on several assumptions about the model and the error term (u):

  • SLR.1: Linearity in Parameters. The model is linear in its parameters (coefficients).
  • SLR.2: Random Sampling. The data is obtained from a random sample of size n.
  • SLR.3: Sample Variation in the Explanatory Variable. The sample outcomes for the explanatory variable (X) are not all the same value.
  • SLR.4: Zero Conditional Mean. The error term u has an expected value of zero given any value of the explanatory variable (E(u|X) = 0).

Key properties often summarized for the Gauss-Markov theorem: Errors have an expectation of zero, are uncorrelated, and have equal variances (homoscedasticity).

Stages of Econometric Modeling

  1. Determination of a research goal.
  2. Specification of explanatory variables (especially crucial for models of time or spatial dependency).
  3. Choice of the functional relationship (mathematical form of the model).
  4. Estimation of structural parameters and interpretation of the results obtained.
  5. Model quality verification.
  6. Practical application of the model.

Spatial vs. Time Series Data Explained

Time series data involves observations for a single entity (e.g., a specific company) collected over multiple time periods (e.g., examining data from 2010 to 2015).

Spatial data (often called cross-sectional data in this context) involves observations for multiple entities (e.g., more than one company) collected at a single point in time (e.g., examining data for the year 2015).

Econometric Modeling vs. Building: Is There a Difference?

Yes, according to this breakdown, econometric modeling includes one additional step compared to just building the model: the “Practical application of the model.”

Selection vs. Choice of Explanatory Variables

Selection of potential explanatory variables is based on:

  • Economic theory
  • Opinions of experts
  • Results of similar experiences

Potential variables should represent different causative factors for the dependent variable (Y).

Choice of explanatory variables involves using a statistical procedure (e.g., Hellwig’s method) to finalize the variables included. Variables incorporated into the model should demonstrate a strong correlation with the dependent variable and weak correlation among themselves.

The Idea Behind Hellwig’s Method

Hellwig’s method is a statistical procedure used for the selection (or choice) of the optimal subset of explanatory variables to include in a linear regression model.

Methods for Functional Relationship Choice

There are several approaches to choosing the mathematical form of a model:

  1. Source Approach: Based on the dynamic features of the analyzed phenomenon. For instance, defining a functional form by solving differential equations like this one:

An0GO7xxA97akBsD2NaKs+AbABsD3UAj1Urf8D+gq2zPsreTYAAAAASUVORK5CYII=

  1. Resultative Approach: Focuses on the relationship observed in the data.
  2. Mixed Approach: Combines elements of the source and resultative approaches.

The Core Idea of OLS

Ordinary Least Squares (OLS) is one of the simplest and most common methods for estimating the parameters in a linear regression model. The goal of OLS is to minimize the sum of the squared differences between the observed values and the values predicted by the linear function, thus fitting the function as closely as possible to the data. Under certain assumptions (Gauss-Markov assumptions), OLS is BLUE (Best Linear Unbiased Estimator).

The Gauss-Markov Theorem Explained

The Gauss-Markov theorem states that if a linear regression model satisfies the classical assumptions (specifically, Assumptions SLR.1 through SLR.5, which imply errors have an expectation of zero, are uncorrelated, and have equal variances), then the OLS estimator for the coefficients is the Best Linear Unbiased Estimator (BLUE). This means that among all linear and unbiased estimators, OLS has the minimum variance.

Model Verification: Measures and Tests

Common measures and tests used for model verification include:

  • Goodness of fit (e.g., , Adjusted )
  • Residual variance estimation (e.g., Standard Error of the Regression (Se), Variance of Residuals (Vs))
  • Durbin-Watson test (for autocorrelation)
  • F-test (for overall significance)
  • t-test (for individual coefficient significance)
  • Inference based on confidence intervals and hypothesis tests.

Discussion of a chosen one: The Coefficient of Determination () measures the proportion of the total variance in the dependent variable that is explained by the independent variables in the model. A higher indicates a better fit of the model to the data.

Understanding the Coefficient of Determination (R²)

The Coefficient of Determination is denoted as and represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It indicates how well the regression model fits the observed data. For forecasting purposes, its value should ideally be higher than 90%, while for descriptive purposes, it should generally be not less than 60-70%, although acceptable values can vary significantly by field.

Interpreting Linear Model Parameter Estimates

In a simple linear model (ŷ = β₀ + β₁xᵢ):

  • The intercept (β₀) is the predicted value of the dependent variable (y) when the independent variable (x) is equal to 0. In many economic models, this value may not have a practical or meaningful interpretation.
  • The slope coefficient (β₁) tells us the predicted change in the dependent variable (ŷ) for a one-unit increase in the independent variable (X).

The Problem of Residual Autocorrelation

Autocorrelation of residuals (also known as serial correlation) occurs when the residual terms from different observations are correlated. Specifically, it often refers to the situation where the residual in period ‘t’ is dependent on the residual in period ‘t-1’ (or other previous periods).

Reasons for Residual Autocorrelation

Common reasons include:

  • Omitted relevant variables
  • Misspecification of the model’s functional form
  • Systematic errors in measurement
  • Inertia in economic variables

Effects of Residual Autocorrelation

If autocorrelation is present but ignored:

  • OLS estimators are still linear and unbiased, but they no longer have the minimum variance (i.e., the minimum variance property is not satisfied).
  • The estimated variances of the OLS estimators (and thus standard errors) are biased (usually underestimated).
  • Confidence intervals and hypothesis tests (like the t-test and F-test) become unreliable because the test statistics do not follow their assumed distributions (e.g., the t-statistic doesn’t follow a t-distribution).
  • OLS estimators are not BLUE.

Occurrence of Heteroscedasticity

Heteroscedasticity refers to the situation where the variance of the error term (u) is not constant across all observations (i.e., Var(u|X) is dependent on X). A larger standard deviation (σ) of the error term for certain ranges of X means that the distribution of the unobserved factors affecting Y is more spread out for those observations.

The Problem of Multicollinearity

Multicollinearity occurs when there is a high correlation between two or more independent variables in a regression model.

  • Perfect multicollinearity: An exact linear relationship exists between independent variables. OLS estimation is impossible.
  • Imperfect multicollinearity: A strong, but not exact, linear relationship exists between two or more independent variables. This can significantly affect the estimation of the coefficients, leading to large standard errors and unstable coefficient estimates.

Defining Econometric Forecasting

Forecasting is the process of calculating the future value(s) of a dependent variable (Y) based on an estimated econometric model. It involves making inferences about the future using the relationships identified in the model.

The Rule of Unbiased Prediction Explained

A forecast is considered unbiased if its expected value is equal to the true expected value of the dependent variable (Y) in the prediction horizon (T). This implies that the expected value of the forecast error (the difference between the actual future value and the forecasted value) is assumed to be zero.

Assumptions for Econometric Forecasting

  1. We know (or have accurately estimated) the relationship between the dependent and independent variables.
  2. There is stability in the identified relationships concerning the set of explanatory variables, the functional form, and the parameter values over the forecast period.
  3. We are able to obtain or predict the future values of the explanatory variables (XT) in the prediction horizon (T).
  4. There is validity in extrapolating the model beyond the sample range used for estimation.

Methods for Determining Explanatory Variable Values

Methods for determining the future values of explanatory variables in the prediction horizon include:

  1. Using information from established strategies and plans.
  2. Extrapolating past trends of the explanatory variables.
  3. Extending the model to forecast the explanatory variables themselves (e.g., using time series models like ARIMA).
  4. Based on expert judgment or scenario analysis for the prediction area.

Main Goal of Econometric Model Verification

The main goal of verifying an econometric model is to analyze its quality, assess how well it represents the underlying economic phenomenon, and confirm the fundamental stochastic relationships based on the available statistical data before using it for interpretation or forecasting.

Regression of Type I vs. Type II

The distinction often relates to whether the relationship is assumed to be exact or stochastic:

  • Regression of Type I (Deterministic Relationship): Assumes an exact functional relationship between variables, implying all data points lie perfectly on the regression line. This is rarely possible with empirical economic data.
  • Regression of Type II (Stochastic Relationship): Acknowledges that the relationship is not exact by adding a stochastic error term (u). This term accounts for the differences (residuals) that arise between the actual observed values of Y and the values predicted by the functional relationship.

Model Stability as a Prediction Assumption

Stability implies that the relationship captured by the econometric model during the estimation period remains unchanged during the forecast horizon. This stability refers to:

  • The set of relevant explanatory variables.
  • The functional form of the relationship.
  • The values of the coefficient estimates.
  • The parameters of the error term distribution (e.g., its variance).

Significant changes in any of these aspects would invalidate forecasts based on the original model.

Linearization of Econometric Models Explained

Linearization is the process of transforming a non-linear econometric model into a linear one, often through mathematical operations like taking logarithms or using variable substitutions. This allows the application of linear estimation techniques like OLS to models that are not initially linear in parameters or variables.

Types of Statistical Regularities

Statistical regularities observed in economic data can include:

  • Structure (Distribution): Patterns in the distribution of a variable.
  • Dynamics and Fluctuations: Trends, cycles, and seasonality over time.
  • Dependence in Time: Autocorrelation within a single time series.
  • Dependence in Space: Correlation between different entities at a point in time (cross-sectional dependence) or spatial autocorrelation.

Structure of an Econometric Model

An econometric model typically consists of:

  • Dependent Variable (Y): The variable being explained or predicted.
  • Independent (Explanatory) Variables (X): Variables used to explain the dependent variable.
  • Structural Parameters (Coefficients, e.g., β): Quantify the relationship between the independent variables and the dependent variable.
  • Analytical (Functional) Form: The mathematical equation specifying the relationship.
  • Error Term (u): Represents unobserved factors, measurement errors, and inherent randomness.

Reasons for the Error Term in Econometric Models

The error term exists due to several reasons, broadly categorized as:

  1. Deterministic Reasons:
    • Contains unobserved influential factors (omitted variables).
    • Results from an imprecise or simplified functional relationship.
    • Measurement errors in the variables.
  2. Indeterministic (Stochastic) Reasons:
    • Represents the inherent stochastic nature of economic phenomena, partly because human behavior plays a central role in the economy.

Applications of the Cobb-Douglas Production Function

The Cobb-Douglas production function is widely used in economics for various applications, including:

  • Analyzing the scale of production (returns to scale).
  • Time-cost analysis.
  • Assessing economic viability or efficiency.
  • Estimating the relationship between inputs (like labor and capital) and output (production).
  • Measuring marginal productivity of inputs.

Criteria for Explanatory Variable Specification

The process typically involves these steps:

  1. Selection of potential explanatory variables: Based on economic theory, expert opinions, or previous experience.
  2. Checking variable properties: Assessing factors like data availability and volatility (e.g., ensuring sufficient variation, sometimes suggested as a coefficient of variation not less than 10-15%).
  3. Choice of final explanatory variables: Using statistical procedures (like stepwise regression, information criteria, or methods like Hellwig’s) to select the best subset based on statistical significance, correlation patterns, and model fit.

Effect of Omitting a Relevant Explanatory Variable

If a relevant explanatory variable (one that actually belongs in the true model and is correlated with included variables) is omitted, the main effects are:

  • The OLS estimators of the included variables will be biased (omitted variable bias).
  • The variance of the error term will likely be overestimated.
  • Statistical inference procedures (t-tests, F-tests, confidence intervals) based on the misspecified model become invalid.

Observed vs. Fitted Values: Understanding Residuals

The difference between the actual observed value of the dependent variable (Y) and the fitted value (ŷ) predicted by the regression model for that observation is called the residual (e = Y – ŷ). Residuals represent the portion of the dependent variable not explained by the model. In OLS, the goal is to minimize the sum of squared residuals, so smaller residuals generally indicate a better model fit.

Unbiasedness of Structural Parameter Estimators

An estimator of a structural parameter (like a regression coefficient β) is considered unbiased if its expected value (the average value over many hypothetical samples) is equal to the true, unknown value of the parameter. E(β̂) = β.

Consistency of Structural Parameter Estimators

Consistency is an asymptotic property of an estimator. An estimator is consistent if, as the sample size (n) increases towards infinity, the estimate (β̂) converges in probability to the true value of the parameter (β). Essentially, with a large enough sample, a consistent estimator is highly likely to be very close to the true parameter value.

The Concept of a Restricted Model

In hypothesis testing, the restricted model is the version of the econometric model that incorporates the constraints specified under the null hypothesis (H₀). Often, this involves assuming that the coefficients of certain independent variables are equal to zero (effectively removing them from the model) or that some other linear restrictions on the parameters hold true.

Point Forecast vs. Interval Forecast

A point forecast provides a single value as the prediction for a future outcome of the dependent variable at a specific future time point.

An interval forecast provides a range (an upper and a lower limit) within which the future value is expected to lie with a certain prescribed probability (confidence level). It acknowledges the uncertainty inherent in forecasting.

Example: Törnquist Functions for Demand Analysis

Törnquist functions are specific non-linear functional forms used to model Engel curves, describing the relationship between the demand for goods (or services) and consumer income, where income acts as the explanatory variable. Different types capture different demand patterns as income changes:

  • Type I (Necessities): Demand increases with income but approaches a saturation level.
  • Type II (Relative Luxuries): Demand starts significantly only after a certain income level and then increases, potentially saturating later.
  • Type III (Luxuries): Demand increases more than proportionally with income.

BzVXbwYa4KXMAAAAAElFTkSuQmCC