Understanding Additive Models and the Curse of Dimensionality

Posted on Jan 16, 2025 in Archaeology

Curse of Dimensionality

7.0 AM; 7.1 Curse of Dimensionality (Richard Bellman): The problem of estimating f becomes vastly harder as p, the dimension of x, increases.

Nonparametric regression model: y=f(x)+e

Minor assumptions:

The x_i are measured without error.
The e_i are independently and identically distributed (i.i.d.) with mean 0.
Variance of the e_i values = unknown constant σ.

Common assumptions:

f is in a Sobolev space (functions with bounded derivatives).
f has a bounded number of discontinuities.

Minimize COD: MARS, CART, PPR, Loess, Random Forests, and Support Vector Machines.

COD applies to: all multivariate analyses not imposing strong modeling assumptions, with equal force to classification, cluster analysis, and multidimensional scaling.

COD descriptions:

For fixed n, as p increases, the data become sparse.
As p increases, the number of possible models explodes (2^{[1+2p+_pC₂]}-1; increases superexponentially in p; not enough sample to enable the data to discriminate models).
For large p, most datasets are multicollinear (occurs when the explanatory values concentrate on an affine subspace in R^p; implies that the predictive value of the fitted model breaks down quickly as one moves away from the subspace in which the data concentrate) or concurve (nonparametric generalization of multicollinearity; occurs when the data concentrate on some smooth manifold (smooth if the smoother used in backfitting can interpolate all the x_i perfectly) within R^p).

Evade COD results:

Barron 1994: Neural networks avoid COD.
Zhao/Atkeson 1991: PPR evade COD.
Wozniakowski 1991: Modification of Hammersley points to dodge COD in multivariate integration.

Goals

Multiple linear regression model: Y=B₀+B₁X₁+…+B_pX_p+e; E[e]=0; var(e)=σ²; e_i i.i.d. x₁…x_p

Useful because:

Interpretable (each x by single B).
Theory supports inference and prediction easily.
Simple interactions and transformations are easy.
Dummy variables allow use of categorical information.
Computation is fast.

Additive Models

Expresses the response variable X_k as a sum (k=1 to p) of individual functions f_k of predictor variables.

Basic assumptions: As in 7.1, but add E[f_k(x_k)]=0 to prevent identifiability problems.

Notes:

One can require that some of the f_k be linear or monotone.
One can include some low-dimensional smooths (f(x₁,x₂)).
One can include some kinds of interactions (f(x₁x₂)).
Transformation of variables is done automatically.
Many regression diagnostics, such as Cook’s distance, generalize to additive models.
Ideas from weighted regression generalize to handle heteroscedasticity.
Approximate deviance tests for comparing nested additive models are available.
One can use the bootstrap to set pointwise confidence bands on the f_k.

The Backfitting Algorithm

Used to fit additive models; allows one to use an arbitrary smoother (spline, Loess, kernel) to estimate the f_k; solves these p estimating equations iteratively; at each stage, replaces the conditional expectation of the partial residuals Y–B₀-Σ_k!=j f_k(x_k) with a univariate smooth.

Backfitting algorithm:

Initialize. Set b₀=Y_bar and set the f_k functions to be something reasonable (linear regression). Set the f_k vectors to match.
Cycle. For j = 1, . . . , p set f_k=S[Y–B₀-Σ_k!=j f_k|x.k] and update the f_k to match.
Iterate. Repeat step 2 until the changes in the f_k between iterations are sufficiently small.

The iterative solution for this has the structure of a Gauss-Seidel algorithm for linear systems (Hastie/Tibshirani).

Converges: for smoothers (spline, kernel, not Loess) that correspond to a symmetric smoothing matrix with all eigenvalues in (0,1); solution is unique unless there is concurvity (solution depends upon the initial conditions). Concurve space of P_g=Q_h (estimating equations basis form) is the set of additive functions such that P_g=0.

Generalized Linear Models

Assumes that there is a link function g(y) such that g(E[Y|x])=x^TB (McCullagh/Nelder).

Generalized additive model expresses the link function as an additive, rather than linear, function of x; GLMs are fit by iterative scoring, a form of iteratively reweighted least squares, the GAM modifies backfitting in a similar way (Hastie/Tibshirani 1990).

Understanding Additive Models and the Curse of Dimensionality

Curse of Dimensionality

Goals

Additive Models

The Backfitting Algorithm

Generalized Linear Models

Recent Notes

Subjects

Publicidad