Endogeneity and Instrumental Variable Estimation
- Explanatory variables uncorrelated to the error term are called exogenous variables. The independent error terms assumption of OLS is known as the exogeneity assumption.
- Violation of the exogeneity assumption leads to a condition known as endogeneity.
- Presentations of endogeneity that can bias OLS results include:
- Selection Bias and Omitted Variable Bias
- Attenuation Bias
- Endogeneity Bias and Simultaneity Bias
- Bias caused by endogeneity can be circumvented through Instrumental Variable Estimation.
- An instrumental variable is an independent variable that is not part of the DGP but is correlated to the endogenous variable and can therefore be used instead to produce a coefficient estimate on the endogenous variable.
- However, to be an instrumental variable, the variable must be correlated to the endogenous variable but unrelated to the error term – that is, it must be exogenous itself.
- The above condition is known as the exclusion restriction. It can be thought of as that the exogenous variable must affect the outcome only through the endogenous variable.
- Example: using cigarette tax as an IV for smoking as cigarette tax would conceivably impact the amount of cigarettes smoked and therefore affect the health outcome – but it does not affect the health outcome directly.
- Two Stage Least Squares (2SLS) is a method of implementing IVE – can be conceptualized as performing two stages of OLS:
- Stage 1: the endogenous variable is fitted to the IV and exogenous covariates.
- Stage 2: The coefficients are estimated by regressing the outcome variable against the fitted endogenous variable and the exogenous covariates.
where n_i is the first stage residual.
where e_i is the 2nd stage residual.
- IMPORTANT: the standard errors of the 2SLS coefficient estimates cannot be estimated using the 2nd stage residuals. Do not use OLS standard error output for the 2nd stage for statistical inference. The 2SLS coefficient estimate on the endogenous variable always has a larger standard error than its OLS counterpart when an unbiased estimate can be obtained with OLS.
- Possible to have more than one endogenous variable and more than one IV.
- there must be at least as many IV’s as endogenous variables.
- When the number of IV’s = number of endogenous variables, the model is said to be just identified.
- When the number of IV’s > number of endogenous variables, the model is said to be over identified.
- Weak Instrument: has low correlation with the endogenous variable, weak IV results in larger standard errors and is more likely to produce outliers.
- Highly correlated instrument: an IV that has very high correlation with the endogenous variable likely suffers from the same endogeneity problem as the endogenous variable thereby violating the exclusion restriction. Ex: sales tax as an instrument for price.
- Having multiple weak instruments is not better than having one good instrument, can lead to the endogeneity problem being more likely to be overlooked.
- Non-compliance: when test subjects do not obey their assigned treatment.
- There are 3 kinds: Never-takers (test subject who does not receive treatment regardless of whether assigned treatment), Always-takers (test subject who receives treatment regardless of whether assigned treatment) and a defier (a test subject who does exactly the opposite of what they are assigned).
- When there is non-compliance there is a disagreement that occurs between the assignment status z_i and the treatment status d_i.
- Since z_i no longer accurately captures whether someone receives treatment, even if it is randomly assigned, it would incur an attenuation bias if used as the explanatory variable.
- Because of non-compliance d_i is no longer fully random and can suffer from selection bias.
A monotonic function is a function that either completely preserves or completely reverses the order of its input. A function is weakly monotonic if there is some flattening in the preservation or reversal of the order of its input, such that the order is preserved (or reversed) at some intervals, but becomes equal at others.
- Following this definition, a function mapping assignment status z_i ∈ {0,1} onto treatment status d_i ∈ {0,1} is weakly monotonic with never-takers (weakly decreasing) and always-takers (weakly increasing), but not monotonic at all with defiers.
- When non-compliance is non-monotonic (either always-takers or never-takers) we can use assignment status z_i as an instrument variable for treatment status d_i in the following linear model:
- Exclusion restriction is guaranteed as z_i is randomly assigned and therefore exogenous. z_i is also guaranteed to correlate with d_i due to monotonicity.