Correlation, Probability, and Hypothesis Testing

Understanding Correlation and Its Applications

1. Types of Correlations:

  • Positive: Both variables move in the same direction.
  • Negative: Variables move in opposite directions.
  • Zero: No relationship between the two variables.

2. Scatterplots: Visual representations of the relationship between two variables.

3. Correlation Scale: Ranges from -1 to 0 (negative correlation) and from 0 to +1 (positive correlation).

4. Formulas: Include formulas for covariance and the correlation coefficient in your cheat sheet.

5. Computing Correlation (r): Understand how to calculate correlation using the formulas.

6. Coefficient of Determination (r-squared): If ‘r’ is significant, compute r-squared.

7. Interpreting r-squared: Understand how to explain the coefficient of determination.

Correlation indicates a relationship between two variables where a change in one is associated with a consistent change in the other.

Scatterplots are graphs displaying two sets of data, one on the x-axis (abscissa) and the other on the y-axis (ordinate). Plotted points show the direction and strength of the correlation between the variables.

The correlation coefficient (Pearson product-moment correlation coefficient, symbolized as r) measures the relationship between two variables, ranging from –1.00 to +1.00. The sign indicates direction, and the number indicates size. A positive correlation means variables change in the same direction; a negative correlation means they change in opposite directions. A coefficient near +1.00 or –1.00 signifies a strong correlation, while one near 0 is weak.

Covariance represents the degree to which two variables change together.

The critical value, based on degrees of freedom (df = n – 2), is a decision point in statistical analyses. A correlation coefficient whose absolute value exceeds the critical value is considered statistically significant.

The coefficient of determination is the effect size, representing the portion of one variable’s variance explained by the variance of a related variable.

Lecture and Demonstration Ideas

Students often misunderstand the difference between correlation and causation, especially with strong correlations. For example, ethical considerations make human smoking and cancer research correlational, yet causation is often assumed. Conversely, some correlations, like the relationship between appliance ownership and birth rates, are easily dismissed as absurd. It’s crucial to emphasize that correlation does not imply causation. Discuss cause-and-effect directionality and the third-variable problem to clarify this.

Math Scores and Calculators

Introduce correlation using examples like the 2000 Mathematics Assessment (2001) study, which showed a negative correlation between calculator use and math performance in grade 4 and a positive correlation in grades 8 and 12. Discuss these findings and potential confounding factors, addressing any causal conclusions that arise.

Cause, Effect, or Third-Variable Problem?

Illustrate the three possible causal explanations for correlational findings. Examples include:

  1. Positive relationship between bathing suit sales and skin cancer incidence.
  2. Positive relationship between years married and life satisfaction.
  3. Positive relationship between arm length and reasoning ability.
  4. Negative relationship between cigarettes smoked and grades in school.

Computing r and r2

Use data examples for computation demonstrations, showing scatterplots and complete solutions.

Interpreting coefficients can be challenging. Use a visual aid illustrating the continuum of correlation coefficients.

Probability and Sampling

1. Probability Rules:

  • Addition Rule: For mutually exclusive events, sum the probabilities. If NOT mutually exclusive: P(A or B) = p(A) + p(B) – P(A and B).
  • Multiplication Rule: P(A and B) = p(A) * p(B).

3. Probabilities from Grouped Frequency Distributions: Review how to calculate these (refer to page 217 in your textbook).

4. Sampling and Sampling Distributions: Review notes on these concepts.

5. Representative Sample: All subgroups are proportionally represented.

6. Random Sample: The best method for achieving a representative sample.

7. Sampling Distributions: Theoretical distributions based on the Central Limit Theorem, which states that as ‘n’ (sample size) increases, a sampling distribution approximates a normal curve.

8. Z-test: Define, calculate the standard error (standard deviation divided by the square root of n), and understand the difference between one- and two-tailed tests.

Hypothesis Testing and Experimental Design

A hypothesis is a specific research prediction. A scientific hypothesis is based on theory or existing knowledge.

Independent variables are manipulated by the experimenter to determine their effect on behavior.

Dependent variables measure the effect of the independent variable(s).

Subject variables describe participant characteristics (e.g., gender, age) that cannot be manipulated.

Between-subjects design (independent-group design): Different participants are in each level of the independent variable.

Within-subjects design (repeated-measures design): Each participant experiences all levels of all independent variables.

Mixed design: Combines between- and within-subjects variables.

One-group experimental design: A single sample mean is compared to a known population mean.

Completely randomized experimental design: Participants are randomly selected and assigned to groups.

Completely randomized factorial experimental design: Similar to the above, but with at least two independent variables, each having at least two levels.

Experimental control: Techniques to rule out alternative explanations for findings, ensuring all conditions are identical except for the manipulated independent variable.

Extraneous variables: Variables that may affect behavior and should be controlled. If they vary between conditions, they confound the results.

Experimenter bias: Unknowingly influencing results due to knowledge of the hypothesis.

Demand characteristics: Changes in participant behavior due to knowledge of the research hypothesis.

Null hypothesis (Ho): Predicts no difference or relationship between sample means.

Research hypothesis (H1): Predicts a difference or relationship between sample means.

Type I error: Rejecting the null hypothesis when it is true.

Type II error: Failing to reject the null hypothesis when it is false.

One-tailed test: Specifies the direction of the experimental effect.

Two-tailed test: Does not specify the direction of the experimental effect.

Power: The probability of correctly rejecting a false null hypothesis. Factors affecting power include alpha level, sample size, the difference between population means, and the type of statistical test.

Key Concepts from Chapter 10

Probability: A measure of how likely an event is to occur.

Mutually exclusive: Two events cannot occur simultaneously.

Addition rule of probability: Used to compute the probability of one event OR another occurring: p (A or B) = p (A) + p (B) (for mutually exclusive events).

Multiplication rule of probability: Used to compute the probability of two events occurring together: p (A and B) = p(A) * p(B) (for independent events).

Representative sample: A sample reflecting all significant subgroups of the population.

Random sampling: Increases the chances of obtaining a representative sample by ensuring everyone in the population has an equal chance of selection.

A random number table is a table of randomly generated numbers, useful for creating random samples.

Distribution of sample means: A theoretical distribution of an unlimited number of sample means. The mean of these sample means equals the population mean (for samples of the same size, n).

Standard error of the mean: The standard deviation of the distribution of sample means. Smaller standard error indicates less error in estimating the population mean.

The Central Limit Theorem states that as sample sizes increase, the distribution of sample means more closely approximates a normal curve. This is crucial for statistical inference.

The z-test compares a sample mean to a population mean to determine the likelihood of the sample belonging to that population.