Understanding Key Statistical Concepts and Theorems

Posted on Mar 24, 2025 in Statistics

Law of Large Numbers

If you take samples of larger and larger size from any population, then the mean (x̄) of the sample tends to get closer and closer to μ (the population mean).

Sampling Distribution

The sampling distribution of the mean approaches a normal distribution as n (the sample size) increases.

Central Limit Theorem

The larger the sample size, the more normal the distribution will be.

Standard Error

The standard error is the standard deviation of the distribution of the sample means. T-distributions have more variability and a flatter, more spread-out shape.

z = (x – μ) / σ

Standard error: σ_x̄

Sample size should be 30 or greater.

Standard error formula: z = (x – μ) / (σ / √n)

Example: Patient Recovery Time

The patient recovery time from a particular surgical procedure is normally distributed with a mean of 5.3 days and a standard deviation of 2.1 days. The 90th percentile is approximately 7.99 days.

Example: IQ Scores

IQ is normally distributed with a mean of 100 and a standard deviation of 15. Let X = IQ of an individual. X ~ N(100, 15)

Z = (120 – 100) / 15 = 1.33

The probability that a person has an IQ greater than 120 is approximately 0.0918.

Example: Middle 50% of IQs

The middle 50% of IQs fall between what two values?

X = 100 + 0.67(15) = 110.05
X = 100 – 0.67(15) = 89.95

Example: Sample Mean Probability

For a sample of n = 25 scores, what is the probability that the sample mean will be within 5 points of the population mean? In other words, what is p(95 < X < 105)?

Z = (M – μ) / √(σ² / n)
Z for 95: (95 – 100) / (15 / √25) = -5 / 3 = -1.67
Z for 105: (105 – 100) / (15 / √25) = 5 / 3 = 1.67
1. 67 corresponds to 0.45254; 2 * 0.45254 = 0.9030

Example: NBA Player Heights

The heights of the 430 National Basketball Association players were listed on team rosters at the start of the 2005–2006 season. The heights of basketball players have an approximate normal distribution with mean (μ) = 79 inches and a standard deviation (σ) = 3.89 inches.

For a height of 77 inches: z = (77 – 79) / 3.89 = -0.51

Transforming Scores

Formula: μ + zσ

Example: μ = 450, σ = 60

Scorch Jones: 520
Singe Johnson: 392

Standard scores: μ = 100, σ = 10

Scorch: (520 – 450) / 60 = 1.166

100 + 1.166(10) = 111.66

Example: Hypothesis Testing

Sample size: 36, SD: 5, Mean: 100; Population: 1,000, Mean: 99, SD: 2.5

State your null and alternative hypotheses:
- H₀: X = 99
- H₁: X ≠ 99
Set your decision criterion (alpha level, one- or two-tailed): Two-tailed
Compute the statistic (Z-test):

Z = (100 – 99) / (2.5 / √36) = 2.4

We would reject the null hypothesis because the value of 2.4 is higher than the critical region of 1.96 (for α = 0.05).

Regions

Retention region: Null hypothesis is true / no effect.
Rejection region: Unlikely that the null hypothesis is true; reject the null hypothesis; there is an effect.

Error Types

Type I: Reject the null hypothesis when it is true.
Type II: Fail to reject the null hypothesis when it is false.

Degrees of Freedom

The more degrees of freedom we have, the closer we get to the actual number.

Hypothesis Testing Differences

The difference in the hypothesis test is that the critical region changes, and the normal curve uses +/- 1.96 (for α = 0.05).

Standard Deviation vs. Standard Error

Sample standard deviation is the average distance of all data points from the mean in the population. The standard error is the average distance of all the sample means (of size n) from the population mean.

Example: t-test

H₀: μ_{physical education test} = 12
H₁: μ_{physical education test} ≠ 12
Two-tailed, α = 0.05
df = 25 – 1 = 24
Critical region: ±2.064
t = (15 – 12) / (29 / √25) = 0.517
t(24) = 0.517, p > 0.05

We fail to reject the null hypothesis because there is no significant effect of the P.E. programs on pushup scores. On average, students who are in P.E. programs do not perform significantly better. M = 15, SD = 1.67

t-test Formula

t = (x – μ) / s_x, where s_x = s / √n

We use a t-test when we do not know the population standard deviation.

s² = SS / (n – 1), s = √(SS / (n – 1))