Core Statistical Concepts and Applications

Linear Regression: Test Scores vs. Hours Studied

Consider the following linear regression equation: Test Scores = 45 + 5(Hours Studied)

  • The Test Scores variable is the outcome variable. It is what the model is trying to predict.
  • Hours Studied is the explanatory variable. It is used to explain or predict changes in the test scores.
  • The slope coefficient associated with Hours Studied is 5. This indicates that for every additional hour spent studying, the model predicts an increase of 5 points in the test score. It implies a positive relationship between the number of hours studied and the test score.
  • The intercept in the regression equation is 45. This can be interpreted as the expected test score for a student who spends 0 hours studying.

Predictions:

  • For a student studying 3 hours, the predicted test score would be: 45 + 5 × 3 = 45 + 15 = 60.
  • For a student studying 10 hours, the predicted test score would be: 45 + 5 × 10 = 45 + 50 = 95.
  • The difference in predicted test scores would be: 95 – 60 = 35.

Based on this model, a student who spends 10 hours studying is estimated to score 35 points higher on the test compared to a student who spends 3 hours studying.

Correlation: Age and Job Satisfaction

A researcher is interested in seeing if there is an association between Age and Job Satisfaction Score. She found a correlation coefficient of r = 0.53 between Age and Score.

Relationship Description:

  • The correlation coefficient of r = 0.53 between Age and Job Satisfaction Score indicates a moderate positive relationship between these two variables.
  • The value of 0.53 suggests a moderate strength in the linear relationship.
  • Since the correlation coefficient is positive, it indicates a positive association between Age and Job Satisfaction Score. This means that as age increases, job satisfaction score tends to increase as well.
  • The moderate positive correlation might suggest that as employees get older, they tend to be more satisfied with their jobs.

Sampling Bias: NFL Championship Poll

A popular American television sports program conducts a poll of viewers to see which team they believe will win the NFL (National Football League) championship this year. Viewers vote by calling a number displayed on the television screen and telling the operator which team they think will win. Do you think that those who participate in this poll are representative of all football fans in America?

Representativeness Issues:

The participants in this poll are likely not representative of all football fans in America due to several potential biases:

  • Self-Selection Bias: Participants choose to call in, meaning they are likely more enthusiastic or opinionated about the topic than the average football fan.
  • Access to the Poll: Not all football fans may watch this specific program or have the means/opportunity to call in. This limits the diversity of the sample.
  • Demographic Bias: The viewership of the program might not mirror the broader demographic of all NFL fans. Certain age groups, regions, or socioeconomic backgrounds might be over- or under-represented.
  • Influence of the Program: The opinions and commentary presented on the program could influence viewers’ choices, leading to a biased representation of opinions.
  • Limited Sample Size and Diversity: The sample size may be too small, and there might not be enough diversity to reflect the wide range of views held by football fans across the country.
  • No Random Sampling: A truly representative poll would require random sampling, ensuring every fan has an equal chance of being selected, regardless of their viewing habits or willingness to call.

Sampling Variation and Error Example

Sample Result Discrepancy: Why might two researchers studying vaccination rates in the same urban area using proper random sampling techniques find slightly different results (e.g., 84% vs. 86%)?

Potential Explanations:

  • Sampling Variation: This is the most likely explanation. When different random samples are drawn from the same population, results naturally vary slightly due to chance.
  • Sampling Error: Related to sampling variation, this is the difference between a sample statistic (like the observed rate) and the true population parameter. It decreases with larger sample sizes but is always present to some extent.
  • Population Heterogeneity: If the urban area is diverse, different samples might capture slightly different segments (e.g., varying demographics with different vaccination rates).
  • Random Fluctuations in Data: Even with a homogeneous population, chance can lead to different mixes of vaccinated/unvaccinated individuals in two samples.
  • Differences in Sampling Techniques: Minor variations in implementation (e.g., time of day, specific locations, researcher interaction) could cause slight differences, even if both follow proper procedures.

Lurking Variables: SAT Score Differences

What lurking variables could offer an alternative explanation for observed differences in SAT scores between genders (e.g., males scoring higher in math, females in verbal)?

Potential Lurking Variables:

  • Educational Background and Opportunities: Differences in schooling quality, access to resources (tutoring, prep courses), and subject emphasis can impact performance.
  • Socioeconomic Factors: Higher-income families may have more access to test preparation resources.
  • Cultural and Social Influences: Stereotypes or social norms suggesting gender-based aptitudes can influence interest and performance.
  • Test-Taking Strategies and Anxiety: Differences in strategies or higher test anxiety (potentially more prevalent in females) can affect scores in high-stakes tests like the SAT.
  • Educational Materials and Teaching Methods: Content and teaching methods might inadvertently favor one gender over the other in specific subjects.

Research Ethics: Studying Smoking Effects

Explain why random assignment cannot be used to study the health effects of smoking and describe an alternative approach.

Ethical Concerns with Random Assignment

Randomly assigning individuals to a ‘smoking group’ would require exposing them to known, significant health risks (lung cancer, heart disease, etc.). This is unethical as it violates the principle of not harming research participants.

Alternative Approaches (Observational Studies)

  • Cohort Studies: Researchers follow groups (smokers, non-smokers, former smokers) over time, comparing health outcomes without assigning smoking behavior.
  • Case-Control Studies: Individuals with a specific health outcome (e.g., lung cancer) are compared to those without it. Researchers look back to determine if smoking exposure was more common in the affected group.
  • Cross-Sectional Studies: A population is surveyed at one point in time to find associations between current smoking status and various health outcomes.

Sampling Concepts: Sociologist’s Day Care Survey

A sociologist wants to know the opinions of employed adult women about government funding for day care. She obtains a list of 520 members of a local business and professional women’s club and mails a questionnaire to 100 of these women selected at random. Sixty-eight questionnaires are returned.

  • a. Population: The target group of interest is all employed adult women who have opinions about government funding for day care.
  • b. Population Frame: The list from which the sample is drawn. Here, it is the list of 520 members of the local business and professional women’s club. (Note: This frame may not perfectly represent the entire population).
  • c. Sample: The subset actually providing data. In this study, it is the 68 women who returned the questionnaires.

Interpreting Confidence Intervals (Policy Support)

A polling agency surveyed 2500 U.S. citizens about the president’s domestic policies. They found a 95% confidence interval for the difference in proportions between men (population 1) and women (population 2) who support the policies as (-0.025 to 0.050).

Interpretation:

  • Range of Difference: We are 95% confident that the true difference in the proportion of support between men and women lies between -0.025 (men 2.5% lower than women) and 0.050 (men 5.0% higher than women).
  • Confidence Level: If this polling process were repeated many times, about 95% of the calculated confidence intervals would contain the true difference in population proportions.
  • Interpretation of Bounds: The interval suggests men’s support could be slightly lower (up to 2.5 points) or slightly higher (up to 5 points) than women’s.
  • Zero within Interval: Since the interval includes 0, we cannot confidently conclude there is a statistically significant difference in support between men and women at the 95% confidence level. Any observed difference in the sample could be due to random sampling variability.
  • Practical Significance: Even if a difference exists, the interval suggests it is relatively small (at most 5 percentage points), which may or may not be practically significant depending on the context.

Hypothesis Testing: Sharpie Longevity

A researcher tested whether Sharpies last longer than the manufacturer’s claim of a mean of 14 continuous hours. Testing 40 Sharpies yielded: mean = 14.5 hours, sd = 1.2 hours. Hypothesis test results: t = 2.635; p = 0.006 at a 5% significance level.

Hypothesis Test Steps:

  • Null Hypothesis (H0): The true mean continuous writing time of Sharpies is 14 hours (μ = 14).
  • Alternative Hypothesis (H1): The true mean continuous writing time of Sharpies is more than 14 hours (μ > 14).
  • Test Results: Sample mean (x̄) = 14.5 hours, Standard Deviation (sd) = 1.2 hours, Sample Size (n) = 40, t-value = 2.635, p-value = 0.006.
  • Interpretation:
    • The sample mean (14.5) is higher than the claimed mean (14).
    • The t-value (2.635) indicates the sample mean is 2.635 standard errors above the hypothesized mean.
    • The p-value (0.006) is the probability of observing a sample mean of 14.5 hours or more if the true mean were actually 14 hours. This low probability (0.6%) suggests the observed result is unlikely under the null hypothesis.
  • Decision: The significance level (α) is 0.05. Since the p-value (0.006) is less than α (0.05), we reject the null hypothesis.
  • Conclusion: There is statistically significant evidence at the 5% level to conclude that Sharpies, on average, can write continuously for more than the manufacturer’s claimed mean of 14 hours.

Interpreting Confidence Intervals (Smokers vs. Non-smokers)

A medical study compared cardiovascular disease patients: Population 1 (current smokers) and Population 2 (current non-smokers). The 95% confidence interval for the difference in proportions (Population 1 – Population 2) related to some outcome was 0.015 ± 0.011.

Interpretation:

  • Interval Calculation: The interval ranges from 0.015 – 0.011 = 0.004 to 0.015 + 0.011 = 0.026. So, the 95% CI is (0.004, 0.026).
  • Confidence Level: We are 95% confident that the true difference in proportions between smokers and non-smokers lies within this interval.
  • Interpretation of the Interval: Since the entire interval (0.004 to 0.026) is above zero, it indicates a statistically significant difference between the two groups at the 95% confidence level.
  • Direction of Difference: The positive interval suggests that the proportion of interest (likely a negative health outcome or risk factor related to cardiovascular disease) is higher among smokers (Population 1) compared to non-smokers (Population 2). The difference is estimated to be between 0.4 and 2.6 percentage points.

Key Statistical Definitions

Random Selection (or Sampling)

Choosing individuals from a population so that every individual has an equal chance of being selected. This ensures the sample is representative, allowing results to be generalized to the population.

Random Assignment

Randomly allocating selected participants to different study groups (e.g., treatment vs. control). This is crucial for causal inference, helping ensure groups are similar except for the treatment, making observed differences likely due to the treatment.

Causal Claims

Conclusions about cause-and-effect relationships between variables. Valid causal claims typically require experimental designs with random assignment to control for confounding variables.

Generalizability

The extent to which study results can be applied to the broader population beyond the sample. Generalizability is enhanced by random selection, which increases the likelihood that the sample represents the population.

Hypothesis Testing: HIV Example

Suppose a person is being tested for HIV. Assume the null hypothesis is that the patient does not have the HIV virus.

a. Hypotheses:

  • Null Hypothesis (H0): The patient does not have the HIV virus.
  • Alternative Hypothesis (HA): The patient has the HIV virus.

b. Four Possible Scenarios:

  • True Negative (Correct): Test is negative, patient is negative. (Accurate, desired outcome for uninfected).
  • False Positive (Type I Error): Test is positive, patient is negative. (Leads to stress, further tests, potential unnecessary treatment).
  • True Positive (Correct): Test is positive, patient is positive. (Accurate, allows timely treatment).
  • False Negative (Type II Error): Test is negative, patient is positive. (Dangerous, delays treatment, risk of transmission).

c. Type I Error:

A Type I error occurs when the test incorrectly indicates the patient has HIV when they do not (a False Positive). It means rejecting the true null hypothesis (H0: patient does not have HIV). Consequences include emotional distress and unnecessary medical procedures. The probability of a Type I error is denoted by α (alpha), the significance level.

Using Significance Level (α) for Decisions

How is α used in hypothesis testing?

  • Setting the Threshold: Researchers choose α (commonly 0.05 or 5%) before the test. It represents the maximum acceptable probability of making a Type I error (rejecting a true null hypothesis).
  • Comparing p-value to α: The calculated p-value (probability of observing the data or more extreme data if H0 is true) is compared to α.
  • Decision Rule:
    • If p-value ≤ α: Reject H0. Results are statistically significant. The data is unlikely if H0 were true.
    • If p-value > α: Fail to reject H0. Results are not statistically significant. The data is plausible under H0.
  • Interpreting the Decision: Rejecting H0 suggests evidence supports HA. Failing to reject H0 means insufficient evidence for HA (it doesn’t prove H0 is true).
  • Risk of Error: α balances the risk of a Type I error against the risk of a Type II error (failing to detect a real effect).

Selecting a Significance Level (α)

What is important when choosing α?

  • Research Context and Field Standards: Conventions vary (e.g., 0.05 in social sciences, much lower in physics).
  • Consequences of Errors: If a Type I error (false positive) is very costly (e.g., approving a harmful drug), use a lower α. If missing a true effect (Type II error) is worse, a higher α might be considered.
  • Sample Size: Larger samples provide more power, potentially allowing a lower α without greatly increasing Type II error risk.
  • Power of the Test (1-β): Balance α (Type I error risk) and β (Type II error risk). Lowering α can decrease power (increase β).
  • Nature of the Hypothesis: Exploratory research might use a higher α; confirmatory research usually requires a lower α.
  • Multiple Testing: If performing many tests, adjust α downwards (e.g., Bonferroni correction) to control the overall Type I error rate.
  • Practical vs. Statistical Significance: Ensure the chosen α aligns with detecting effects that are practically meaningful, not just statistically significant (especially with large samples).

Type I Error and its Relation to α

Definition of Type I Error: A Type I error occurs when we reject the null hypothesis (H0) when it is actually true. It’s a ‘false positive’ – concluding there is an effect or difference when none exists in reality.

Relationship with α (Significance Level): The significance level, α, is precisely the probability of making a Type I error, set by the researcher before conducting the test. If α is set at 0.05, the researcher accepts a 5% risk of incorrectly rejecting a true null hypothesis. The decision rule (reject H0 if p-value ≤ α) directly uses α as the threshold for this risk.

Power of a Test and Type II Error

Definition of Power: The power of a statistical test is the probability that it correctly rejects the null hypothesis (H0) when the alternative hypothesis (HA) is actually true. It’s the probability of detecting a real effect or difference. Power = 1 – β.

Definition of Type II Error (β): A Type II error occurs when we fail to reject the null hypothesis (H0) when it is actually false. It’s a ‘false negative’ – failing to detect an effect or difference that truly exists. The probability of a Type II error is denoted by β (beta).

Relationship: Power and the probability of a Type II error (β) have an inverse relationship (Power = 1 – β). Higher power means a lower chance of making a Type II error (missing a real effect). Lower power means a higher chance of a Type II error.

Factors Influencing Power:

  • Effect Size: Larger effects are easier to detect (higher power).
  • Sample Size (n): Larger samples generally lead to higher power.
  • Significance Level (α): A higher α (e.g., 0.10 vs 0.05) increases power but also increases the Type I error risk.
  • Data Variability: Less variability (smaller standard deviation) increases power.

Researchers aim for high power (often 0.80 or 80%) while controlling the Type I error rate (α).

P-value Explained

The p-value is the probability of observing a test statistic (or sample result) as extreme as, or more extreme than, the one actually observed, assuming the null hypothesis (H0) is true. A small p-value indicates that the observed data is unlikely under the null hypothesis, providing evidence against H0.

Confidence Intervals Explained

A Confidence Interval (CI) provides a range of plausible values for an unknown population parameter (like the mean or proportion) based on sample data.

  • It combines a point estimate (e.g., sample mean, x̄) with a measure of its precision (its standard error).
  • It depends on a specified confidence level (e.g., 90%, 95%, 99%).
  • Higher confidence levels (e.g., 99%) yield wider intervals.
  • Lower confidence levels (e.g., 90%) yield narrower intervals.
  • Interpretation: We are [confidence level]% confident that the true population parameter lies within the calculated interval.

Constructing a Confidence Interval

Steps typically involve:

  1. Calculate the point estimate from the sample (e.g., sample mean or proportion).
  2. Calculate the standard error of the estimate.
  3. Determine the appropriate sampling distribution (e.g., t-distribution with degrees of freedom, or standard normal N(0,1)).
  4. Choose the desired confidence level (e.g., 95%).
  5. Use the distribution and confidence level to find the critical value (e.g., t* or z*).
  6. Calculate the margin of error (critical value × standard error).
  7. Construct the interval: Point Estimate ± Margin of Error.

Interpreting Confidence Intervals

If we were to repeatedly draw samples from the population and construct a 95% confidence interval for each sample, we would expect about 95% of those intervals to contain the true population parameter. Any single interval either contains the true parameter or it does not. It’s possible, just by chance, to draw an unusual sample where the resulting CI does not capture the true value (this happens 1 – confidence level % of the time).

Comparing Two Proportions Example (Yawning)

Suppose we compare the proportion of people yawning after seeing a ‘seed’ yawn (Group 1) versus not seeing a seed (Group 2). Let π₁ be the proportion for Group 1 and π₂ for Group 2.

  • We estimate π₁ and π₂ using sample proportions (p₁ and p₂).
  • We estimate the difference: p₁ – p₂.
  • We calculate the standard error of this difference, SE(p₁ – p₂).
  • Example: If p₁ = 0.2941 and p₂ = 0.25, the observed difference is 0.2941 – 0.25 = 0.0441.
  • Using the standard error and a critical value (e.g., from N(0,1) for a 95% CI), we construct the interval.
  • Suppose the 95% CI for the difference (π₁ – π₂) is (-0.218 to 0.306).
  • Interpretation: Since this interval includes 0, we cannot be 95% confident that there is a true difference in yawning proportions between the seed and control groups. The plausible range for the difference includes values where the seed group yawns less (-0.218), the same (0), or more (0.306) than the control group.
  • If the CI was entirely above zero (e.g., 0.01 to 0.08), we would be 95% confident that the seed group had a higher proportion of yawning.