Statistical Inference and Regression Analysis: Key Concepts

Statistical Inference and Regression Analysis

Statistical Inference: Drawing conclusions about a population based on information from a sample.

Standard Error: Measures the variability of the sample mean estimate, calculated based on the standard deviation of the sample and the sample size.

Hypothesis Test Decisions:

  • Reject the null hypothesis
  • Fail to reject the null hypothesis

Multiple Linear Regression

A regression model that estimates the relationship between two or more independent variables and a dependent variable.

  • Coefficient: A numerical value representing the strength and direction of the relationship between an independent variable and the dependent variable.
  • Intercept: The expected value of the dependent variable when all independent variables are zero.
  • R-squared: A measure indicating the proportion of variance in the dependent variable predictable from the independent variables.
  • Adjusted R-squared: A modified R-squared that adjusts for the number of predictors.
  • Standard Error of the Estimate: Measures the accuracy of predictions, representing the average distance of observed values from the regression line.
  • Residuals: The differences between observed and predicted values, used to assess model accuracy.
  • Homoscedasticity: The assumption that the variance of residuals is constant.
  • Multicollinearity: High correlation between independent variables, making it difficult to determine individual effects.
  • Outliers: Observations far from other data points, which can influence regression results.
  • Interaction Term: Assesses if the effect of one independent variable changes based on another.

Hypothesis Testing Concepts

  • P-value: The probability of obtaining test results as extreme as observed, assuming the null hypothesis is true.
  • Significance Level (α): The threshold to reject the null hypothesis, commonly 0.05 or 0.01.
  • Confidence Interval: A range of values believed to contain the true population parameter.
  • Test Statistic: A standardized value calculated from sample data, compared to a critical value.
  • Critical Value: The threshold value to reject the null hypothesis.
  • One-Sided Test: Tests for a directional effect (greater than or less than).
  • Two-Sided Test: Tests for any difference, without specifying direction.

Statistical Tests

  • Z-Test: Used to compare means when population variance is known and sample size is large.
  • T-Test: Used to compare means when population variance is unknown or sample size is small.
  • ANOVA: Used to compare means across multiple groups.
  • Power of a Test: The probability of correctly rejecting a false null hypothesis.