Statistical Analysis of Production Models and Time Series Forecasting

Statistical Analysis of Production Models

a) Significance parameters: Divide the model (CAN, BARRELS) using a dummy variable CAN: assign a value of 1 or 0. Barrel: βo + β2(prodtotal – 100)+ β4(prodtotal-100)2 ; Can: (βo + β1) + (β2 + β3)(prdototal – 100) + (β4 + β5)(prodtotal – 100)2 + U

β0 = Expected barrel production when total production is 100hl, β1 = Expected difference between can production and barrel production when total is 100, β2 = Expected increase of barrel production when total production increases 1hl around 100, β3 = Expected difference in increase of can production minus barrel production when total production increases 1hl around 100, β4 = The curvature of the barrel bar production when production increases, β5 = Difference in the curvature of the can model minus the barrel model

b) Compare productions: (Table p1_II) Substitute estimates in each parameter β. As P-value, if X is less than 0.05, accept βx different from 0, and it has value Y (estimate).

c) Coefficient determination: R2 = Model SS / Total SS. The adjusted model explains R2 of the variability of the barrel/can production.

d) Estimate similar, different signs: Ho: β1 = β2, β4 = β5; H1: Otherwise. Restricted model assuming the null hypothesis is true (where β2 is replaced with β1, then simplify).

e) Model appropriate? Ho: β1 = β2 = … = 0; H1: Otherwise. (Table P1_II) P-value < 0.05 → reject Ho → the model is significant ↔ Fcalc(ratio) > Fk,n-k-1 → Reject Ho (K = Df Model = nº explanatory variables Table P1_I. Total df(n) – k = Residual df, df = SS / Mean Squares)

f) Heteroscedasticity, error variance: Var(U) ≈ E[e2] (Table P1_III) log(e2) = γo + γ1CAN + U

Ho: γ1 = 0 H1: 0.05 > γ1 ≠ 0 P-value (of CAN, in the same table) < 0.05 → reject Ho → the log of the residuals squared is not constant → square of residuals depends on variable can → the error variance is not constant → model has heteroscedasticity.

g) Error standard deviation: Substitute values (estimates from Table P1_III) in log(e2) = γo + γ1CAN + U → Two models: CAN and BARRELS. Give value 0 or 1 to γ1 (apply exponential if model has log.).

Barrel: Var(U) ≈ E(e2) = exp(γo + γ1 * 0 or 1) = error variance. (apply square root to these results afterwards, σ). Interpret: “Variability of can production is larger than that of barrel production.” (If you have to compare with the initial: apply square root to Residual Mean Square from Table P1_II)

Multicollinearity and Variable Elimination

a) Multicollinearity: To determine if multicollinearity exists, observe the values of the main diagonal of the explanatory variables inverse matrix. If one is higher than (correlation coefficient 0.7) 1/(1-0.72) = 1.96 → multicollinearity exists.

b) Elimination method: First, eliminate the variable with a higher value in the inverse. Of the other two, compare its linear correlation coefficient (in the matrix on the left). If it is larger than 0.7, multicollinearity still exists, so another variable should be eliminated.

Time Series Analysis

Trend (1): We observe a decreasing trend, although this decrease isn’t constant, and sometimes we observe an increase.

Cycles (1): We might observe a cyclic component, but in any case, data collected over only 10 years is not enough to confirm such behavior.

Seasonality (4): Seasonal behavior, the same pattern more or less repeated in all years. Decreases until… gradually increases… Maximum of the year achieved in… Also seen in (3).

Irregular component (2): This figure shows the part of variability not explained by the model. The variance doesn’t seem to increase with time, which is a good sign (although there are some peaks).

Propose ARIMA (p,d,q)x(P,D,Q)s (MAq, ARp) → d=D=1, s=12 → Regular part (look at the 6 first coefficients):

  • ACF→ α1 < 0 significant, the rest = zero. Suggest MA(1). Confirmed in PACF, negative coeff. increasing to 0.
  • PACF→ α1, α2 > 0 < 0. AR(2). ACF confirm, alternating signs decreasing to 0.

Seasonal part (seasonal lags, 12, 24):

  • ACF→ α12 < 0, significant, α24 = 0. MA(1). PACF > 0, < α24 < 0 (negative increasing to zero) confirm.
  • PACF> 0, < 0, significant, α24 = 0. AR(1). ACF confirm, α12 > 0, < 0, α24 > 0, alternating signs.

→ Combine the 4 models: regular part(p,q)(poner AM, AR…) – seasonal part(P,Q) – ARIMA reg. x seas. 0,>

ARIMA Model Specification and Validation

a) Lag notation: (0,1,1)x(0,1,1)12 (MA,1, SMA1) → (1-B)(1-Be12)Zt = (1-φ(MA)B)(1-Θ(SMA)Be12)εt

b) Significance parameters: Hypothesis test for Θ,φ,u… Ho: u=0, H1: u≠0 → tcalc = û / Sû (û is the estimate of the parameter u, and Sû its stnd. error, appear in Table P2_II, also could appear tcalc in “t”) if tcalc > trdf (its “n” is s(meses..) * #años), reject Ho, significant

c) White noise:

  1. 1) (mean error formulae) Ho: E(et)=0, H1: E(et)≠0 → |ME| < z * σ/t (formula, where ME is in Table P2_I, z = 1.96, σ = estimated white noise stand. deviation, under the same table, or square root of the one above) → Not reject Ho, accept E(et)=0
  2. 2) Var(et) = σ2 for all (A revés) t → Propose (Table P2_III ) Residuals2 = γo + γ1t + γZt + U → Ho: γ1=γ2=0, H1: γ1≠0 or γ2≠0 → (hyp. test all regression coeff, formulae p.1) Si Fcalc=Model MS/Residual MS (Table P2_III ) < fk,n-k-1 (k=df model=MSS/MMS)(Tabla F distribution) dont reject Ho → Var(et) constant a t >
  3. 3) Cov(et, et-k) = 0 A t,k → True as no ACF/PACF coefficient is significant (gráficas)
  4. 4) et≈N (μt, σt)? Yes, because NPP shows points (more or less) aligned.

CONCLUSION: Residuals are white noise, 4 conditions met.

d) Model correct predictions? Create table: period – Actual value (statement) in forecast CI? (Yes/no) – Over/underestimation → All actual values within the CI, but 8 of 9 forecast larger than actual values. Model seems to be overestimating, which isn’t good.

e) Reformulation: Consist on propose 1 or + models from autocorrelation coefficients of significative residuals, to get a ore suitable one to describe the series. We cannot do that, as not ACF/PACF coeff are significant.

f) Overadjustment: Consist in (p+1,d,q) and (p,d,q+1) on trend → (0,1,1)x(0,1,1)12 → (1,1,1)x… New parameter AR(1), signifi., and (0,1,2)x… MA(2) (Table 5,6: Pvalu<0,’5, significant, overad. model > 0,’5,)