Linear Models: Regression, Interactions, and Best Practices

Simple Regression, Inference, and Centering

Simple Linear Regression: Models the linear relationship between two continuous variables.

  • Outcome (y): Variable being predicted.
  • Predictor (x): Variable used for prediction.
  • Intercept (b₀): Predicted y when x=0. Affected by centering.
  • Slope (b₁): Change in y for a one-unit change in x.
  • Residual: Difference between observed and predicted y.
  • OLS (Ordinary Least Squares): Method to find the best-fitting line by minimizing the sum of squared residuals.

Centering: Subtracting the mean of x from each x value.

  • Changes the interpretation of the intercept to the predicted y at average x.
  • Doesn’t affect the slope.
  • Useful when x=0 isn’t meaningful.

Standardized Slope (Beta): Slope after standardizing both x and y. Represents change in y (in SDs) for a 1 SD change in x.

R-squared: Proportion of variance in y explained by the model (0 to 1).

  • Squaring the correlation between y and predicted y (ŷ) gives the same value.
  • Can be misleading with multiple predictors.

Adjusted R-squared: Modified R-squared that penalizes adding unnecessary predictors.

Inference:

  • Confidence Interval: Range of plausible values for the population intercept and slope.
  • Hypothesis Testing: Testing if the population intercept and slope are different from zero.
  • p-value: Probability of observing the data (or more extreme data) if the null hypothesis is true.

Key Considerations:

  • Correlation ≠ Causation: Regression shows association, not causation.
  • Extrapolation: Avoid predicting outside the observed range of x.

Dummy Codes, Estimated Means, and Contrasts

  • Categorical Predictor: Variable representing distinct groups (e.g., gender, treatment).
  • Dummy Coding: Converts categories to numerical code variables (0s & 1s) for regression.
  • Dichotomous: Two groups. One dummy variable. Reference group = 0, other group = 1. Intercept = reference group mean, slope = group difference.
  • Polytomous: More than 2 groups. g-1 dummy variables (g= # groups). Reference group = all 0s. Other groups get a single 1. Each slope = difference from reference.
  • Reference Group: Category coded as 0. Crucial for interpretation. Choose a meaningful, well-defined, and sufficiently large group.
  • Contrasts: Compare non-reference groups (use after model fitting).
  • Effect Sizes:
    • Standardized Slope (like Cohen’s d): Group difference in SD units.
    • R²/Adjusted R²: Variance explained by predictor.
  • Model Significance (ANOVA): Tests overall effect of categorical predictor (are any groups different?). Use aov() after lm().
  • Key Principle: Regression with dummy variables estimates group means and differences. Doesn’t prove causality. Consider other factors.

Shared Variance, Multiple Regression, Partial Effects

Multiple Regression: Predicts outcome (y) using multiple predictors (x).

  • Zero-Order Effect: Relationship between one predictor & y, ignoring others. (Simple regression slope)
  • Partial Effect: Unique effect of one predictor on y, controlling for others. (Multiple regression slope)
  • Shared Variance: Overlap in variance explained by multiple predictors. Causes partial effects to be smaller than zero-order effects. (Think left/right leg predicting height)
  • Collinearity: High correlation between predictors. Makes it hard to isolate unique effects (unstable slopes), even if predictors are individually important. (Think anxiety/depression predicting an outcome)
  • Covariates: Predictors included only for control, not primary interest. Linear model treats them the same as other predictors. Too many can reduce power and increase collinearity.

Key Principle: Partial effects are often smaller than zero-order effects due to shared variance. Consider both when interpreting results. Don’t over-control!

Continuous Interactions, Simple Slopes, Spotlight

Moderation (Interaction Effects)

  • Concept: The relationship between two variables changes depending on a third (moderator) variable. Adds “context” to models.
  • Interaction Term: Product of interacting predictors (e.g., predictor1 * predictor2).
  • Simple Effects: Effect of one predictor when the moderator is at ZERO (or mean if centered).
  • Interaction Coefficient: How much one predictor’s effect changes per unit increase in the moderator.
  • Centering: Subtract mean from predictors. Makes simple effects more meaningful (at average moderator level). Intercept becomes outcome at average predictor levels. Essential for meaningful interpretation.
  • Standardizing: Center AND divide by SD. Coefficients become effect sizes in SD units.
  • Spotlight/Conditional Effects Plots: Visualize interaction by plotting relationship at different moderator values. Crucial for understanding interactions.
  • Simple Slopes Analysis: Calculate predictor slope at specific moderator values. Finds thresholds/crossovers.
  • Potentiation: Effects strengthen each other.
  • Attenuation: Effects weaken each other.
  • Crossover: Relationship direction reverses.
  • Essential: Plot interactions, center continuous moderators, include simple effects with interaction term, don’t confuse simple/partial effects. Use theory to guide interaction selection (avoid overfitting).

Categorical Interactions, Simple Slopes, Spotlight

Moderation (Interaction): Effect of a predictor depends on another variable (moderator).

Categorical Moderators:

  • Binary: 2 levels (e.g., sex). Creates 1 interaction term. Simple slopes for each level of the moderator.
  • Nominal: More than 2 levels (e.g., program). Multiple dummy codes & interaction terms (1 fewer than # of levels).

Interpreting Interactions:

  • Coefficient: Difference in slopes (continuous moderator) or difference of differences (categorical moderator).
  • Plot essential: Visualize different slopes/group differences.

Centering: Mean-center continuous predictors for meaningful intercepts (effect at average, not zero).

Simple effects/slopes: Effect of predictor at specific level of moderator.

Marginal contrasts/means: Differences between groups at various moderator levels.

Higher-order interactions: Interactions of interactions (complex, interpret with caution).

Reporting: State significance of interaction & simple effects/slopes/contrasts. Avoid interpreting main effects if interaction is significant. Consider removing non-significant interactions if not primary research focus.

Functional Form, Measurement Error, Multicollinearity

Linear Model Assumptions & Diagnostics (Cheat Sheet)

I. Formula Issues (Bias Coefficients & SEs):

  • Linearity: Relationship between x and y is straight.
    • Diagnose: Residuals vs. Fitted plot (flat line desired).
    • Fix: Polynomial/piecewise regression, transformations, GAM, GLM, ML.
  • Measurement Error (in x): Predictors measured perfectly.
    • Diagnose: Reliability (test-retest, internal, inter-rater).
    • Fix: Improve measurement, disattenuation, SEM, errors-in-variables models.
  • (Multi)Collinearity: Predictors are not redundant.
    • Diagnose: VIFs (>5 moderate, >10 high). Correlation >0.9 (limited).
    • Fix: Drop/combine predictors, PCA, penalized regression (Lasso/Ridge), SEM, PLS.

II. Residual Issues (Bias SEs, not Coefficients – covered next time):

  • Homoscedasticity: Constant error variance.
  • Independence: Residuals uncorrelated.
  • Normality: Residuals normally distributed.

(Multi)collinearity Example: Height1 & height2 (collinear); size, height1, & weight (multicollinear); male & female dummy codes (exact collinearity).

Key Principle: Diagnostics over strict cutoffs. Consider degree of violation.

Residual Distribution, Independence, Outliers

Regression Assumptions & Diagnostics (Cheat Sheet)

Residuals: Difference between observed & predicted values. Core of diagnostics.

1. Residual Assumptions (Violations mainly bias SEs):

  • Constant Error Variance (Homoscedasticity): Equal spread of residuals across predictors. Violation: Fan-shaped residual plot (e.g., income vs. spending). Fix: Transform outcome, Huber-White SEs, model error variance.
  • Normality: Residuals normally distributed. Less critical with large n. Violation: Bimodal residual plots (e.g., zero-inflated data). Fix: Transform outcome, nonparametric methods (bootstrap), GLM. Yeo-Johnson > Box-Cox.
  • Independence: Residuals uncorrelated. Key for clustered/time series data. Violation: Similar residuals within groups (students in schools) or time points (monthly unemployment). Fix: Cluster-robust SEs, multilevel models, GEE, time series analysis.

2. Outliers: Atypical data points.

  • Leverage: Unusual predictor values.
  • Distance: Far from regression line (large residual).
  • Influence: Strongly changes coefficients (high Cook’s D).
  • Fix: Correct errors, keep if valid, remove if from different population, recode (Winsorize). Be transparent; report approach. Use sensitivity analyses.

Polynomial Regression

Polynomial Regression: Models non-linear relationships.

  • Power Polynomials: x, x², x³, etc. Highest power (degree) determines curve shape. Degree – 1 = # of bends.
  • Quadratic (d=2): Most common, one bend (parabola). x² determines curvature (up/down).
  • Cubic (d=3): Two bends.
  • Higher Degrees: More bends, more flexible, higher overfitting risk.
  • Extrema (Quadratic): Highest/lowest point on curve.
  • Centering x: Reduces multicollinearity between polynomial terms. Use poly() in R for orthogonal (uncorrelated) polynomials.
  • Include Lower Degrees: If using x², include x. If using x³, include x² and x.
  • Theory-Driven: Choose degree based on theory, not just fit.
  • Visualize: Essential for checking model assumptions (especially linearity). Don’t extrapolate predictions beyond observed x range.

Key Principle: Model non-linearity when a straight line is a poor fit, guided by theory. Quadratic is often sufficient.

Best Practices, Questionable Practices, Power

Research Practices Continuum: (Bad → Good) Misconduct, Questionable Practices, Best Practices

Bad/Questionable Practices:

  • Fabrication: Making up data.
  • Falsification: Manipulating data.
  • Plagiarism: Presenting others’ work as your own.
  • Undisclosed COIs: Hiding conflicts of interest.
  • Unapproved Research: Lacking ethical approval.
  • Selective Reporting: Cherry-picking favorable results or omitting unfavorable ones.
  • HARKing: Presenting exploratory work as confirmatory (rewriting hypotheses post-hoc).
  • p-hacking: Manipulating analysis to achieve p < .05.
  • Data Peeking: Analyzing data as it’s collected & stopping at p < .05.

Best Practices:

  • Open Sharing: Publicly sharing data, materials, & code (OSF, GitHub).
  • Preregistration: Specifying hypotheses & analyses before data collection (OSF, registered reports).
  • Power Analysis: Determining necessary sample size (a priori, sensitivity).
    • A priori: Find n given effect size, alpha, power.
    • Sensitivity: Find minimum detectable effect size given n, alpha, power.

Core Principles:

  • Transparency and rigor are key for replicability.
  • Justify all research decisions, especially effect size estimates.
  • Prioritize finding the truth over chasing significant results.
  • Be mindful of perverse incentives and cognitive biases.

Reporting, Sharing, and Reviewing

Intro: Explanation vs. Prediction, Exploratory vs. Confirmatory, Target Population, Justify Predictors

Methods:

Sampling: Methods, Criteria, Characteristics (distributions, validity – not “everyone uses it”)

Analysis: Sample Size/Stopping Rule, Model Spec (interactions, etc.), Missing Data (document!), Outliers (justify)

Results: Report all coefficients, Effect Sizes, CIs, Standardization, Assumption Checks (address violations!), Zero-Order vs. Partial Effects, Simple Effects (interactions), Visualize w/uncertainty (error bars)

Discussion: Limitations (representativeness, design), Avoid Overreaching (correlation ≠ causation), Logical Conclusions from Results

Translation Activity

Linear Models (LM): Relates outcome variable to predictor(s).

  • Continuous Predictor: Numeric variable (e.g., age, height).
  • Categorical Predictor: Group membership (e.g., gender, treatment).
  • Interaction: Combined effect of predictors differs from individual effects. Moderation.
  • Curvilinear: Non-linear relationship (e.g., quadratic, cubic).
  • Assumptions:
    • Independence: Observations are not related. Watch for clustering (repeated measures, groups).
    • Linearity: Linear relationship between predictors and outcome.
    • Normality: Residuals (errors) are normally distributed. Problematic for bounded outcomes (e.g., percentages, binary).
    • Homoscedasticity: Equal variance of residuals across predictor values.
  • Diagnostics: Check assumptions using plots (e.g., residuals vs. fitted).
  • Polynomial Regression: Models curvilinear relationships (e.g., outcome ~ predictor + predictor^2).

Translating Research Questions:

  1. Identify outcome and predictor variables.
  2. Consider potential covariates (confounding variables).
  3. Formulate hypothesis: How are predictors related to the outcome? Is there an interaction?
  4. Choose appropriate statistical model (LM, polynomial, etc.).
  5. Check assumptions. Address violations if necessary (e.g., transformations, different model).
  6. Interpret results in context of research question.

Linear Models:

Intercept: Expected Y when all predictors are 0. Contextualize interpretation (e.g., “For a newborn…”).

Slope: Change in Y for a 1-unit change in X, holding others constant.

Interaction: Effect of X on Y depends on the level of another predictor (non-parallel lines).

Simple Slope/Effect: Effect of X on Y at a specific value of another predictor (in interaction models).

Partial Effect: Effect of X controlling for other predictors.

Zero-Order Effect: Effect of X ignoring other predictors.

Centering: Simplifies interpretation of simple slopes at the mean.

Model Assumptions (Quick Check):

Linearity: Relationship between X and Y is a straight line.

Independence: Observations are independent. Watch for repeated measures!

Normality: Residuals are normally distributed.

Equal Variance (Homoscedasticity): Residual variance is constant across predictor values.

Other Essentials:

Mediation: X influences Y through M. Needs longitudinal data for causal claims. Cross-sectional mediation is flawed.

Power Analysis: Probability of finding a true effect. Post hoc power using observed effect size from small samples is BAD. Use independent effect size estimate.

Pre-registration: Plan your analysis before collecting data to avoid p-hacking and selective reporting.