Statistical Inference: Hypothesis Testing and Confidence Intervals
Chapter 17: Inferences When Standard Deviation is Unknown
Big Idea: T-Test for Unknown Standard Deviation
The t-test is used when the population standard deviation (SD) is unknown. It has fewer conditions than z-tests for inferences about a mean:
- Data is a simple random sample (SRS) from a larger population.
- Observations follow a normal distribution.
- We estimate standard error using s / √n, where s is the sample SD and √n accounts for the variability of sample means.
T-Test Formula and Properties
The t-test statistic is calculated as: (X̄ – μ) / (s / √n). It is more variable than a z-test because we estimate σ with s, resulting in a wider, non-normal distribution. The degrees of freedom (df) is n-1, and higher df leads to a distribution closer to normal.
Steps for Conducting a T-Test
- Check if conditions are met.
- Calculate the t-test statistic using sample mean (X̄), sample SD (s), sample size (n), and hypothesized population mean (μ₀).
- Compute the probability (p-value) of observing the test statistic t or more extreme under the null hypothesis.
- Interpret the p-value to draw conclusions about the hypothesis.
T-Star and Confidence Intervals
For a 95% confidence interval (CI) with 25 observations, we use the qt() function in R to find t*: t_star <- qt(p = 0.975, df = 24). The formula for the CI is: X̄ ± t_star * s / √n.
Robustness of T-Tests
T-tests are robust against non-normality, except in cases of outliers or strong skew. With outliers, larger sample sizes improve robustness.
Assumptions and Considerations
- Plot data to check for outliers and skew.
- SRS is more important than normality.
- For n < 15, use t-procedures if data appears close to normal.
- For n > 15, use t-tests unless there are outliers or strong skew.
- For n > 40, generally use t-tests.
Chapter 17 Part 2: Paired T-Test
Matching by Design
Paired t-tests are used when data is matched by design, such as before-and-after measurements on the same individuals. It tests the mean difference within subjects: T = (μd – 0) / (σd / √n), where μd is the mean difference and σd is the standard deviation of the differences.
R Functions for Paired T-Tests
- Use pull() to extract raw data from a data frame.
- Use qt() to find the quantile for the desired confidence level.
- Use t.test() with paired = T to conduct a paired t-test.
Advantages and Considerations
Paired t-tests remove confounding by comparing within-subject differences. Ensure a wash-out period between treatments to avoid carryover effects.
Chapter 18: Comparing Two Population Means
Two-Sample Tests
We now move from one-sample to two-sample tests, comparing means from two populations. The null hypothesis (H₀) is: μ₁ – μ₂ = 0, and the alternative hypothesis (H₁) is: μ₁ – μ₂ ≠ 0.
Graphical Comparison
Compare the distributions of the two samples using histograms or boxplots to assess their shapes, centers, and spreads.
Conditions for Two-Sample T-Tests
- Two SRSs from two populations.
- Independent samples.
- Same quantitative variable for both samples.
- Both samples are normally distributed with no outliers.
Standard Error and T-Statistic
The standard error (SE) is estimated as: SE = √(s₁²/n₁ + s₂²/n₂). The two-sample t-test statistic is: t = (X̄₁ – X̄₂) / SE.
Degrees of Freedom and Confidence Interval
The degrees of freedom formula is complex. The confidence interval is calculated as: (X̄₁ – X̄₂) ± t_star * √(s₁²/n₁ + s₂²/n₂).
R Function for Two-Sample T-Tests
Use t.test() to conduct a two-sample t-test, specifying the two samples and the alternative hypothesis.
Robustness
Two-sample t-tests are more robust than one-sample tests, especially for skewed data. Sample sizes as small as 5 can work when samples are of equal size. Larger samples are needed when populations have different shapes.
Chapter 19: Inference About a Population Proportion
Binary Data and Confidence Intervals
This chapter deals with binary data (e.g., success/failure). The large sample CI for a proportion is: p̂ ± z_star * √[p̂(1-p̂)/n]. However, the Plus 4 method is often more effective.
Plus 4 Method
The Plus 4 method improves the accuracy of CIs for binary data, especially for small sample sizes. It adds 2 successes and 2 failures to the data, calculates the adjusted proportion (p̃), and uses it to compute the CI.
Other Methods for Proportion CIs
- Wilson Score Interval (prop.test in R): Similar to Plus 4 with a correction factor.
- Clopper-Pearson or Exact Interval (binom.test in R): Conservative, providing better coverage than advertised.
Example and Comparison of Methods
The example compares different methods for calculating a 95% CI for the proportion of elderly individuals who died within a year of a hip fracture. The Plus 4 and Wilson Score methods provide similar results, while the large sample method is less accurate.
Finding Sample Size for Proportion Studies
To determine the required sample size (n) for a desired margin of error (m), we use the formula: n = (z_star/m)² * p_star * (1-p_star), where p_star is an estimate of the true proportion (use 0.5 if unknown).
Calculating P-Values
Use the pnorm() function in R to find the p-value for a given z-value, specifying lower.tail = F for the upper tail probability.
Chapter 20: Inference for Comparing Two Proportions
Large Sample CI for Difference of Two Proportions
Use this method when the number of successes and failures is greater than 10 for both samples. The formula is: (p̂₁ – p̂₂) ± z_star * √[p̂₁(1-p̂₁)/n₁ + p̂₂(1-p̂₂)/n₂]. However, it may have low coverage for small samples.
Plus 4 Method for Two Proportions
Similar to the Plus 4 method for one proportion, this method adds 1 success and 1 failure to each sample and calculates adjusted proportions (p̃₁ and p̃₂) to compute the CI. It is more accurate for small sample sizes.
Z-Test for Two Proportions
The z-test statistic is calculated as: (p̂₁ – p̂₂) / √[p̂(1-p̂)(1/n₁ + 1/n₂)], where p̂ is the pooled proportion. Use pnorm() to find the p-value.
Conditions and Considerations
- Use the large sample CI or z-test when counts of success and failure are greater than 5 for both samples.
- Use the Plus 4 method when sample sizes are at least 5.
- Consider using a z-test for one-sided hypotheses, as chi-squared tests are only two-sided.
Chapter 21: The Chi-Squared Goodness of Fit Test
Categorical Data with More Than Two Categories
The chi-squared goodness of fit test assesses how well observed data fits a hypothesized distribution for a single categorical variable with more than two categories.
Example: Jury Selection and Ethnicity
The example examines whether the ethnic distribution of jurors matches the expected distribution based on population proportions.
Chi-Squared Statistic and Distribution
The chi-squared statistic is calculated as the sum of (observed – expected)² / expected for each category. The chi-squared distribution has degrees of freedom (df) equal to the number of categories minus 1.
P-Value and Chi-Squared Test in R
Use pchisq() to find the p-value for a given chi-squared value and df. Use chisq.test() to conduct a chi-squared goodness of fit test.
Conditions for Chi-Squared Test
- Fixed number of observations.
- Independent observations.
- Mutually exclusive categories.
- At least 80% of cells have expected counts of 5 or more.
- All cells have expected counts greater than 1.
Chapter 22: Inference for Two-Way Tables
Analyzing Two Categorical Variables
This chapter extends the chi-squared test to analyze the relationship between two categorical variables.
Example: Vaping and JUUL Advertisements
The example investigates the association between exposure to JUUL advertisements and vaping among teens.
Expected Counts and Chi-Squared Statistic
The expected count for each cell is calculated as (row total * column total) / overall total. The chi-squared statistic is calculated similarly to the goodness of fit test.
Degrees of Freedom and P-Value
The degrees of freedom for a two-way table is (number of rows – 1) * (number of columns – 1). Use pchisq() to find the p-value.
Chi-Squared Test of Independence
Use chisq.test() to conduct a chi-squared test of independence. The correct = T option applies a continuity correction for small sample sizes.
Conditions and Assumptions
- Expected counts of at least 5 for at least 80% of cells.
- All expected counts greater than 1.
- Data from independent SRSs or a single SRS with individuals classified according to two categorical variables.
Z-Test for Two-Way Tables
For 2×2 tables, a z-test can be used to find one-sided p-values, which are not available with the chi-squared test.
Graphical Comparison
Use dodged histograms to compare the conditional distributions of one variable across levels of another variable.
Chapter 23: Inference for Regression
Recap of Regression Analysis
This chapter reviews key concepts of regression analysis, including linearity, correlation, line of best fit, and interpretation of slope, intercept, and R².
Assumptions for Regression Inference
- Linear relationship between x and y.
- Normality of residuals (vertical distances between observed and fitted values).
- Independent observations.
- Equal standard deviation of responses for all values of x.
Graphs for Checking Assumptions
- Scatter plot: Shows data, fitted line, and residuals.
- QQ plot: Checks normality of residuals.
- Fitted vs. Residuals plot: Checks for random scatter.
- Amount Explained plot: Compares boxplots of y and residuals.
Robustness and Outliers
Regression is relatively robust to non-normality but sensitive to outliers.
Chapter 23 Part 2: Inference for Regression
R Functions for Regression Output
- tidy(): Presents regression coefficients and statistics.
- glance(): Provides overall model fit statistics.
- augment(): Creates an augmented data frame with fitted values and residuals.
Sum of Squared Errors (SSE) and Regression Standard Error
SSE measures the overall discrepancy between observed and fitted values. The regression standard error (s) is a measure of model fit, with lower values indicating better fit.
Hypothesis Testing for Regression Slope
We test the null hypothesis H₀: b = 0 (no association between x and y) against the alternative hypothesis H₁: b ≠ 0 (association exists).
R Functions for Hypothesis Testing
Use tidy() to obtain the estimated slope (b̂), standard error (SEb), and t-statistic. Use pt() to find the p-value.
Confidence Interval for Regression Slope
The CI for the slope is calculated as: b̂ ± t_star * SEb, where t_star is obtained from the qt() function.
Test for Lack of Correlation
A non-significant hypothesis test for the slope implies no correlation between x and y.
Chapter 24: ANOVA
Analysis of Variance
ANOVA compares means of multiple groups to determine if there is a statistically significant difference between them.
Example: Cancer Treatment in Mice
The example investigates the effect of different cancer treatments on tumor volume in mice.
ANOVA Test Statistic (F)
The F statistic is the ratio of mean squares for groups (MSG) to mean squares for error (MSE). MSG measures the variation between group means, while MSE measures the variation within groups.
ANOVA in R
Use aov() to conduct an ANOVA test. Use tidy() to display the results, including degrees of freedom, sum of squares, mean squares, F statistic, and p-value.
Tukey’s Honestly Significant Difference (HSD) Test
Tukey’s HSD test is used to identify which specific groups differ from each other after a significant ANOVA result.
Conditions for ANOVA
- Independent SRSs from each population.
- Normal distribution of populations (robustness to non-normality exists).
- Equal standard deviations of populations (rule of thumb: largest SD < 2 * smallest SD).
Bootstrap Confidence Intervals
Non-Parametric Confidence Intervals
Bootstrap methods are used to construct CIs when data is not normally distributed or when standard formulas are not applicable.
Steps for Bootstrap CI
- Find the median (or other statistic) of the original sample.
- Repeatedly resample with replacement and calculate the statistic for each resample.
- Create a histogram of the resampled statistics to approximate the sampling distribution.
- Find the percentiles that capture the middle 95% of the distribution as the CI bounds.
When to Use Bootstrap
- No formula or unknown formula for CI.
- Assumptions for standard formulas not met.
- CI for any statistic.
Permutation Tests
Hypothesis Testing with Small Samples or Non-SRS Data
Permutation tests are used for hypothesis testing when sample sizes are small or data is not from an SRS.
Example: Malaria and Alcohol Consumption
The example examines the effect of beer consumption on mosquito attraction.
Permutation Test in R
Use the infer package to conduct a permutation test. The specify(), hypothesize(), generate(), and calculate() functions are used to define the test and generate the null distribution.
Interpretation
If the null hypothesis is true, the distribution of the response variable should be similar across groups, even after shuffling the data.
Bonus Chapter: Regression Model with Categorical Exposure
Key R Functions
- qt(), pt(), qnorm(), pnorm(), pchisq(): Distribution functions.
- t.test(), binom.test(), prop.test(), chisq.test(): Hypothesis testing functions.
- Broom package: tidy(), glance(), augment(): Functions for summarizing and manipulating model output.
- lm(), predict(), confint(), aov(), TukeyHSD(): Regression and ANOVA functions.
- ggplot2, dplyr: Data visualization and manipulation packages.
Example: Calcium Intake and Bone Density
The example uses ANOVA to compare mean daily calcium intake in adults with different bone densities.