Statistics Essentials: Stem Plots, Quartiles, Correlation

Stem Plots

To make a stem plot:

  1. Separate each observation into a stem (all but the final digit) and a leaf (the final digit). Stems may have as many digits as needed, but each leaf contains only a single digit.
  2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line to the right of this column. Include all stems needed to span the data, even with no leaves.
  3. Write each leaf in the row to the right of its stem, in increasing order out from the stem.

Quartiles and Interquartile Range (IQR)

To find quartiles:

  1. Arrange the observations in increasing order.
  2. Locate the median.
  3. The first quartile (Q1) is the median of the observations to the left of the median in the ordered list.
  4. The third quartile (Q3) is the median of the observations to the right of the median in the ordered list.

Interquartile Range (IQR)

IQR = Q3 – Q1

Identifying Outliers

  1. Calculate 1.5 * IQR.
  2. Lower Limit for Outliers: Q1 – (1.5 * IQR)
  3. Upper Limit for Outliers: Q3 + (1.5 * IQR)

Standard Deviation

Standard deviation = sqrt(∑(value – mean)2 / (n – 1)). s will be 0 or greater than 0.

Correlation

To calculate the correlation coefficient (r):

  1. Find the mean of each variable.
  2. Calculate the deviations of each data point from their respective means.
  3. Multiply the paired deviations and sum them.
  4. Divide this sum by the product of the standard deviations of both variables and (n – 1).

The result, r, ranges between -1 and 1:

  • Values close to 1: Strong positive correlation.
  • Values near -1: Strong negative correlation.
  • Values around 0: No correlation.

Z-Score

Z-score = (value – mean) / standard deviation

To find the corresponding probability, take the first two digits of the z-score (e.g., 1.2 from 1.27) and look it up on the side of the z-table. Then, find the last digit (e.g., 0.07) on the top of the table and find the intersection.

Sampling Techniques

Good sampling techniques aim to reduce all sources of error.

  • Undercoverage: Some groups in the population are left out of the sample selection process.
  • Nonresponse: An individual chosen for the sample cannot be contacted or refuses to participate.
  • Response Bias: A systematic pattern of incorrect responses in a sample survey.
  • Wording of Questions: The most important influence on the answers given in a sample survey.

Lurking Variables

A lurking variable is not among the explanatory or response variables but influences the response.

Confounding

Two variables (explanatory or lurking) are confounded when their effects on a response variable cannot be distinguished. Observational studies often fail due to confounding between the explanatory variable and lurking variables.

Parameters vs. Statistics

  • Parameters: Values from populations.
  • Statistics: Values from samples.

Statistical Inference

Statistical inference is drawing conclusions about a population from a sample.

Law of Large Numbers

As the sample size increases, it tends to reflect the population trend. With a large enough sample, it will approximate the unknown parameter.

Population Distribution

The distribution of values of the variable among all individuals in the population.

Sampling Distribution

The distribution of values taken by the statistic in all possible samples of the same size from the same population.

Central Limit Theorem

When the sample size is large, the sampling distribution of the sample mean is approximately normal.