Statistical Inference: Z-Distribution, T-Distribution, and Regression
Chapter 6: Standard Error (SE)
The standard error (SE) is the standard deviation of the sampling distribution of a statistic. It measures the precision of the sample statistic as an estimate of the population parameter. A z-distribution is the standard normal distribution with a mean of 0 and a standard deviation of 1. It is used for testing hypotheses about a single population mean or proportions when σ is known. The T-distribution is a family of distributions that are similar to the normal distribution but have heavier tails (to account for more variability when the sample size is small). The population standard deviation σ is unknown, or testing hypotheses about single or two population means (you use the sample standard deviation s instead).
Confidence Intervals
Use critical z-values (e.g., z* = 1.96 for 95% confidence level).
Hypothesis Testing
Compare test statistics to critical z-values to determine significance.
- Ha: M < Mo (left-tail test)
- Ha: M > Mo (right-tail test -1)
- Ha: M = Mo (two-tail test, x/2)
P-Value
The probability of observing the test statistics or one more extreme, assuming the null hypothesis is true (smallest level of x that allows you to reject Ho). “Failing to reject” the null hypothesis is considered not statistically significant. Type 1 error refers to rejecting the null hypothesis when it is actually true. Sample Size Estimation = Round up to the nearest integer “find a number.”
Chapter 7: Degrees of Freedom
Degrees of freedom are the maximum number of logically independent values, which may vary, in a data sample.
TS:
- Ha: M < Mo = P(T ≤ t’)
- M > Mo = P(T ≥ t’)
- M ≠ Mo = 2P(T ≥ |t’|)
7.1 Example: Is the value t = __ significant at the % level? (__ < p-value < __).
7.2 Paired t-test
Used when comparing two related groups (e.g., measurements on the same individuals before and after an intervention).
- Data Type: Dependent samples (e.g., same subjects measured twice, or matched pairs like twins).
- Null (H0): The mean difference between the paired observations is zero.
- Alternative (Ha): The mean difference is not zero.
Pooled t-test
- Definition: Used when comparing two independent groups with equal variances.
- Data Type: Independent samples (e.g., two separate groups like male vs. female or treatment vs. control).
- Null (H0): The means of the two groups are equal.
- Alternative (Ha): The means are not equal.
Matched t-test
It also involves dependent samples. In a matched t-test, the “matching” explicitly pairs individuals or observations from two groups based on certain characteristics (e.g., pairing participants in treatment and control groups by age, gender, etc.). Hypotheses, formula, and process are identical to the paired t-test, but the focus is on matched pairing rather than repeated measures.
Hypothesis Testing for M1 – M2
- Case #1: If we know σ1 & σ2 = Z-score.
- Case 2 & 3: We do not know σ1 and σ2.
- Case #2: If population variances are assumed unequal (σ ≠ σ) Separate T (unequal variance T).
- Case #3: T Assuming the population variances ARE equal (σ = σ). Pooled T or Equal Variance T. Degrees of freedom equation different (see back).
How to decide which T is right?
- Will tell what to assume.
- P-Value for HT of σ1/σ2: H: σ1 = σ2
- CI for σ1/σ2 = if the confidence interval includes “0”, it indicates that there is no significant difference between the groups being compared, regardless of whether the test is pooled or matched (paired) – meaning the null hypothesis of no difference cannot be rejected.
Example for 7.2 decision CI = I am % confident the difference in __ for __ falls b/t (_,_).
C% CI for M1 – M2:
- (-,+) –> 0 is included, M1 could equal M2 (non- significant)
- (-,-) –> M1 < M2
- (+,+) –> M1 > M2 (statistically significant, M1 ≠ M2)
C% CI for σ1/σ2 variances:
- (≤1, ≥1) –> 1 is included. σ1 could equal σ2 (not significant, assume equal)
- (<1, <1) and (>1,>1) –> statistically significant, assume variances not equal
Chapter 8: Inference for Proportions
Inference for a single proportion
- Parameter: P
- Requirement: np ≥ 10 & n(1-p) ≥ 10, then use Z.
- HT for p: Ho = Po, Ha: P < Po or Ha: P > Po or Ha: P1 ≠ P2
- p-value, decision, conclusion.
Inference for two proportions
- Parameter: p1-p2.
- HT for p1-p2. Ha: P1 < P2 or P1 > P2 or P1 ≠ P2.
Chapter 10: Simple Linear Regression
Simple linear regression, estimated regression line.
Assume sample unless specifically says population. Sample = S, population equals sigma four open-ended, 23 multiple-choice. Standard error for slope does not need to be calculated.