Linear Models: Regression, Interactions, and Best Practices
Simple Regression, Inference, and Centering
Simple Linear Regression: Models the linear relationship between two continuous variables.
- Outcome (y): Variable being predicted.
- Predictor (x): Variable used for prediction.
- Intercept (b₀): Predicted y when x=0. Affected by centering.
- Slope (b₁): Change in y for a one-unit change in x.
- Residual: Difference between observed and predicted y.
- OLS (Ordinary Least Squares): Method to find the best-fitting line by minimizing the sum of squared residuals.
Centering: Subtracting the mean of x from each x value.
- Changes the interpretation of the intercept to the predicted y at average x.
- Doesn’t affect the slope.
- Useful when x=0 isn’t meaningful.
Standardized Slope (Beta): Slope after standardizing both x and y. Represents change in y (in SDs) for a 1 SD change in x.
R-squared: Proportion of variance in y explained by the model (0 to 1).
- Squaring the correlation between y and predicted y (ŷ) gives the same value.
- Can be misleading with multiple predictors.
Adjusted R-squared: Modified R-squared that penalizes adding unnecessary predictors.
Inference:
- Confidence Interval: Range of plausible values for the population intercept and slope.
- Hypothesis Testing: Testing if the population intercept and slope are different from zero.
- p-value: Probability of observing the data (or more extreme data) if the null hypothesis is true.
Key Considerations:
- Correlation ≠ Causation: Regression shows association, not causation.
- Extrapolation: Avoid predicting outside the observed range of x.
Dummy Codes, Estimated Means, and Contrasts
- Categorical Predictor: Variable representing distinct groups (e.g., gender, treatment).
- Dummy Coding: Converts categories to numerical code variables (0s & 1s) for regression.
- Dichotomous: Two groups. One dummy variable. Reference group = 0, other group = 1. Intercept = reference group mean, slope = group difference.
- Polytomous: More than 2 groups. g-1 dummy variables (g= # groups). Reference group = all 0s. Other groups get a single 1. Each slope = difference from reference.
- Reference Group: Category coded as 0. Crucial for interpretation. Choose a meaningful, well-defined, and sufficiently large group.
- Contrasts: Compare non-reference groups (use after model fitting).
- Effect Sizes:
- Standardized Slope (like Cohen’s d): Group difference in SD units.
- R²/Adjusted R²: Variance explained by predictor.
- Model Significance (ANOVA): Tests overall effect of categorical predictor (are any groups different?). Use aov() after lm().
- Key Principle: Regression with dummy variables estimates group means and differences. Doesn’t prove causality. Consider other factors.
Shared Variance, Multiple Regression, Partial Effects
Multiple Regression: Predicts outcome (y) using multiple predictors (x).
- Zero-Order Effect: Relationship between one predictor & y, ignoring others. (Simple regression slope)
- Partial Effect: Unique effect of one predictor on y, controlling for others. (Multiple regression slope)
- Shared Variance: Overlap in variance explained by multiple predictors. Causes partial effects to be smaller than zero-order effects. (Think left/right leg predicting height)
- Collinearity: High correlation between predictors. Makes it hard to isolate unique effects (unstable slopes), even if predictors are individually important. (Think anxiety/depression predicting an outcome)
- Covariates: Predictors included only for control, not primary interest. Linear model treats them the same as other predictors. Too many can reduce power and increase collinearity.
Key Principle: Partial effects are often smaller than zero-order effects due to shared variance. Consider both when interpreting results. Don’t over-control!
Continuous Interactions, Simple Slopes, Spotlight
Moderation (Interaction Effects)
- Concept: The relationship between two variables changes depending on a third (moderator) variable. Adds “context” to models.
- Interaction Term: Product of interacting predictors (e.g., predictor1 * predictor2).
- Simple Effects: Effect of one predictor when the moderator is at ZERO (or mean if centered).
- Interaction Coefficient: How much one predictor’s effect changes per unit increase in the moderator.
- Centering: Subtract mean from predictors. Makes simple effects more meaningful (at average moderator level). Intercept becomes outcome at average predictor levels. Essential for meaningful interpretation.
- Standardizing: Center AND divide by SD. Coefficients become effect sizes in SD units.
- Spotlight/Conditional Effects Plots: Visualize interaction by plotting relationship at different moderator values. Crucial for understanding interactions.
- Simple Slopes Analysis: Calculate predictor slope at specific moderator values. Finds thresholds/crossovers.
- Potentiation: Effects strengthen each other.
- Attenuation: Effects weaken each other.
- Crossover: Relationship direction reverses.
- Essential: Plot interactions, center continuous moderators, include simple effects with interaction term, don’t confuse simple/partial effects. Use theory to guide interaction selection (avoid overfitting).
Categorical Interactions, Simple Slopes, Spotlight
Moderation (Interaction): Effect of a predictor depends on another variable (moderator).
Categorical Moderators:
- Binary: 2 levels (e.g., sex). Creates 1 interaction term. Simple slopes for each level of the moderator.
- Nominal: More than 2 levels (e.g., program). Multiple dummy codes & interaction terms (1 fewer than # of levels).
Interpreting Interactions:
- Coefficient: Difference in slopes (continuous moderator) or difference of differences (categorical moderator).
- Plot essential: Visualize different slopes/group differences.
Centering: Mean-center continuous predictors for meaningful intercepts (effect at average, not zero).
Simple effects/slopes: Effect of predictor at specific level of moderator.
Marginal contrasts/means: Differences between groups at various moderator levels.
Higher-order interactions: Interactions of interactions (complex, interpret with caution).
Reporting: State significance of interaction & simple effects/slopes/contrasts. Avoid interpreting main effects if interaction is significant. Consider removing non-significant interactions if not primary research focus.
Functional Form, Measurement Error, Multicollinearity
Linear Model Assumptions & Diagnostics (Cheat Sheet)
I. Formula Issues (Bias Coefficients & SEs):
- Linearity: Relationship between x and y is straight.
- Diagnose: Residuals vs. Fitted plot (flat line desired).
- Fix: Polynomial/piecewise regression, transformations, GAM, GLM, ML.
- Measurement Error (in x): Predictors measured perfectly.
- Diagnose: Reliability (test-retest, internal, inter-rater).
- Fix: Improve measurement, disattenuation, SEM, errors-in-variables models.
- (Multi)Collinearity: Predictors are not redundant.
- Diagnose: VIFs (>5 moderate, >10 high). Correlation >0.9 (limited).
- Fix: Drop/combine predictors, PCA, penalized regression (Lasso/Ridge), SEM, PLS.
II. Residual Issues (Bias SEs, not Coefficients – covered next time):
- Homoscedasticity: Constant error variance.
- Independence: Residuals uncorrelated.
- Normality: Residuals normally distributed.
(Multi)collinearity Example: Height1 & height2 (collinear); size, height1, & weight (multicollinear); male & female dummy codes (exact collinearity).
Key Principle: Diagnostics over strict cutoffs. Consider degree of violation.
Residual Distribution, Independence, Outliers
Regression Assumptions & Diagnostics (Cheat Sheet)
Residuals: Difference between observed & predicted values. Core of diagnostics.
1. Residual Assumptions (Violations mainly bias SEs):
- Constant Error Variance (Homoscedasticity): Equal spread of residuals across predictors. Violation: Fan-shaped residual plot (e.g., income vs. spending). Fix: Transform outcome, Huber-White SEs, model error variance.
- Normality: Residuals normally distributed. Less critical with large n. Violation: Bimodal residual plots (e.g., zero-inflated data). Fix: Transform outcome, nonparametric methods (bootstrap), GLM. Yeo-Johnson > Box-Cox.
- Independence: Residuals uncorrelated. Key for clustered/time series data. Violation: Similar residuals within groups (students in schools) or time points (monthly unemployment). Fix: Cluster-robust SEs, multilevel models, GEE, time series analysis.
2. Outliers: Atypical data points.
- Leverage: Unusual predictor values.
- Distance: Far from regression line (large residual).
- Influence: Strongly changes coefficients (high Cook’s D).
- Fix: Correct errors, keep if valid, remove if from different population, recode (Winsorize). Be transparent; report approach. Use sensitivity analyses.
Polynomial Regression
Polynomial Regression: Models non-linear relationships.
- Power Polynomials: x, x², x³, etc. Highest power (degree) determines curve shape. Degree – 1 = # of bends.
- Quadratic (d=2): Most common, one bend (parabola). x² determines curvature (up/down).
- Cubic (d=3): Two bends.
- Higher Degrees: More bends, more flexible, higher overfitting risk.
- Extrema (Quadratic): Highest/lowest point on curve.
- Centering x: Reduces multicollinearity between polynomial terms. Use poly() in R for orthogonal (uncorrelated) polynomials.
- Include Lower Degrees: If using x², include x. If using x³, include x² and x.
- Theory-Driven: Choose degree based on theory, not just fit.
- Visualize: Essential for checking model assumptions (especially linearity). Don’t extrapolate predictions beyond observed x range.
Key Principle: Model non-linearity when a straight line is a poor fit, guided by theory. Quadratic is often sufficient.
Best Practices, Questionable Practices, Power
Research Practices Continuum: (Bad → Good) Misconduct, Questionable Practices, Best Practices
Bad/Questionable Practices:
- Fabrication: Making up data.
- Falsification: Manipulating data.
- Plagiarism: Presenting others’ work as your own.
- Undisclosed COIs: Hiding conflicts of interest.
- Unapproved Research: Lacking ethical approval.
- Selective Reporting: Cherry-picking favorable results or omitting unfavorable ones.
- HARKing: Presenting exploratory work as confirmatory (rewriting hypotheses post-hoc).
- p-hacking: Manipulating analysis to achieve p < .05.
- Data Peeking: Analyzing data as it’s collected & stopping at p < .05.
Best Practices:
- Open Sharing: Publicly sharing data, materials, & code (OSF, GitHub).
- Preregistration: Specifying hypotheses & analyses before data collection (OSF, registered reports).
- Power Analysis: Determining necessary sample size (a priori, sensitivity).
- A priori: Find n given effect size, alpha, power.
- Sensitivity: Find minimum detectable effect size given n, alpha, power.
Core Principles:
- Transparency and rigor are key for replicability.
- Justify all research decisions, especially effect size estimates.
- Prioritize finding the truth over chasing significant results.
- Be mindful of perverse incentives and cognitive biases.
Reporting, Sharing, and Reviewing
Intro: Explanation vs. Prediction, Exploratory vs. Confirmatory, Target Population, Justify Predictors
Methods:
Sampling: Methods, Criteria, Characteristics (distributions, validity – not “everyone uses it”)
Analysis: Sample Size/Stopping Rule, Model Spec (interactions, etc.), Missing Data (document!), Outliers (justify)
Results: Report all coefficients, Effect Sizes, CIs, Standardization, Assumption Checks (address violations!), Zero-Order vs. Partial Effects, Simple Effects (interactions), Visualize w/uncertainty (error bars)
Discussion: Limitations (representativeness, design), Avoid Overreaching (correlation ≠ causation), Logical Conclusions from Results
Translation Activity
Linear Models (LM): Relates outcome variable to predictor(s).
- Continuous Predictor: Numeric variable (e.g., age, height).
- Categorical Predictor: Group membership (e.g., gender, treatment).
- Interaction: Combined effect of predictors differs from individual effects. Moderation.
- Curvilinear: Non-linear relationship (e.g., quadratic, cubic).
- Assumptions:
- Independence: Observations are not related. Watch for clustering (repeated measures, groups).
- Linearity: Linear relationship between predictors and outcome.
- Normality: Residuals (errors) are normally distributed. Problematic for bounded outcomes (e.g., percentages, binary).
- Homoscedasticity: Equal variance of residuals across predictor values.
- Diagnostics: Check assumptions using plots (e.g., residuals vs. fitted).
- Polynomial Regression: Models curvilinear relationships (e.g., outcome ~ predictor + predictor^2).
Translating Research Questions:
- Identify outcome and predictor variables.
- Consider potential covariates (confounding variables).
- Formulate hypothesis: How are predictors related to the outcome? Is there an interaction?
- Choose appropriate statistical model (LM, polynomial, etc.).
- Check assumptions. Address violations if necessary (e.g., transformations, different model).
- Interpret results in context of research question.
Linear Models:
Intercept: Expected Y when all predictors are 0. Contextualize interpretation (e.g., “For a newborn…”).
Slope: Change in Y for a 1-unit change in X, holding others constant.
Interaction: Effect of X on Y depends on the level of another predictor (non-parallel lines).
Simple Slope/Effect: Effect of X on Y at a specific value of another predictor (in interaction models).
Partial Effect: Effect of X controlling for other predictors.
Zero-Order Effect: Effect of X ignoring other predictors.
Centering: Simplifies interpretation of simple slopes at the mean.
Model Assumptions (Quick Check):
Linearity: Relationship between X and Y is a straight line.
Independence: Observations are independent. Watch for repeated measures!
Normality: Residuals are normally distributed.
Equal Variance (Homoscedasticity): Residual variance is constant across predictor values.
Other Essentials:
Mediation: X influences Y through M. Needs longitudinal data for causal claims. Cross-sectional mediation is flawed.
Power Analysis: Probability of finding a true effect. Post hoc power using observed effect size from small samples is BAD. Use independent effect size estimate.
Pre-registration: Plan your analysis before collecting data to avoid p-hacking and selective reporting.