Statistical Concepts and Methods
Module 1: Core Statistical Concepts
Parameters and Statistics
A parameter summarizes a population (e.g., mean, median, standard deviation). Parameters are constant but usually unknown. A statistic is calculated from a sample (e.g., sample mean) and used to estimate parameters. Statistics are random and depend on the sample.
Point Estimates
A point estimate is a sample value used to estimate a specific population parameter. For instance, the sample mean can be a point estimate for the population mean. All point estimates are statistics, but not all statistics are point estimates.
Standard Error and Sampling Distributions
The standard error is the standard deviation of point estimates. The sample distribution describes the distribution of data within a single sample. The sampling distribution describes the distribution of a point estimate across all possible samples of a given size. The center of the sampling distribution is typically equal to the true population parameter for unbiased estimators. If the population is normal, the sampling distribution is normal; otherwise, the Central Limit Theorem (CLT) applies.
Sampling Variation and Sample Characteristics
Sampling variation refers to how different samples yield different estimates. Random mixing ensures unbiased samples. Larger sample sizes reduce variability among estimates, not within a sample. A representative sample roughly resembles the population. Generalizability means results from the sample apply to the population. Bias occurs when certain individuals have a higher chance of being included in the sample.
Margin of Error
The margin of error (z*/t* × SE) describes the difference between a point estimate and the true population parameter due to sampling variability.
Module 2: Advanced Statistical Methods
Sampling and Bootstrapping
Sampling without replacement provides more precise parameter estimates. Bootstrapping approximates the sampling distribution by drawing with replacement from the original sample. It’s used to estimate the standard error of a point estimate. Bootstrapping works best with large samples.
Confidence Intervals
A confidence interval is a range of plausible values for a population parameter. It expresses uncertainty rather than a single estimate (Point Estimate ± MOE). A 95% confidence interval means we are 95% confident that the true population parameter falls within that interval. Wider intervals indicate higher confidence levels. Larger sample sizes lead to narrower intervals.
Central Limit Theorem (CLT) and Law of Large Numbers
The CLT states that for large sample sizes, the sampling distribution of a point estimate is approximately normal, regardless of the population’s shape. CLT conditions include independent observations, random sampling, and a sufficiently large sample size. The Law of Large Numbers states that as sample size increases, the point estimate converges to the true population parameter.
Normal Distribution and Z-scores
The normal distribution is a symmetric, unimodal, bell-shaped curve, denoted as N(0, 1). The Z-score measures how many standard deviations an observation falls above or below the mean.