Statistical Analysis: Variables, Data, and Inference
Variables and Study Groups
- Categorical variables
- Quantitative variables
- Explanatory variable
- Response variable
Study Groups –> Population
–> sample
Sampling and Data Collection
Sample:
- Statistical Inference
- Sampling Bias
- Random Sample
- Association vs. Causation
- Confounding Variables
Collecting Data:
- Experiment
- Observational Study
- Randomized Experiment
- Control Group
- Placebo
- Blind Experiment
- Double-Blind Experiment
- Randomized Comparative Experiment
- Matched Pairs
Describing Data
- Standard Deviation
- 95% Rule
- Z-Score
- 5-Number-Summary
- Range
- IQR
- Boxplot
- Side-by-side plots
- Scatterplot
- Direction of Association
- Linear Correlation
- Correlation
- Regression Equation
- Prediction
- Residual
- Causation
- Simpson’s Paradox
- Augmented Scatterplot
- Visualization w/ 2 or more variables
3.1 Sampling Distributions
•Sample statistics vary from sample to sample
•Sampling Distribution shows how much the sample statistic varies from sample to sample
•If samples are randomly selected, the sampling distribution will be centered around the population parameter
•For most of the statistics we consider, if the sample size is large enough, the sampling distribution will be symmetrical and bell-shaped
•If you take random samples of the same size, from the same population, the sampling distribution will be centered around the true population parameter
•If sampling bias exists, your sampling distribution can give you bad information about the true parameter
•Variability of the statistic……
•Standard Error is the standard deviation of the sample statistic
•As the sample size increases, the variability of the sample statistics tends to decrease and the sample statistics tend to be closerto the true value of the population parameter!
•For larger sample sizes, you get less variability in the statistics, so less uncertainty in your estimates
•Statistical inference is drawing conclusions about a population based on a sample
•We use a sample statistic to estimate a population parameter
•To assess the uncertainty of a statistic, we need to know how much it varies from sample to sample
•To create a sampling distribution, take many samples of the same size from the population, and compute the statistic for each
•Standard error is the standard deviation of a statistic
3.2 Confidence Intervals
•The larger the standard deviation of the sampling distribution, the greater the spread in the distribution of sample statistics. That means that there is a high uncertainty surrounding any single statistic and our margin of error will be large.
•A confidence intervalfor a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples
•The success rate (proportion of all samples whose intervals contain the parameter) is known as the confidence level
•A 95% confidence interval will contain the true parameter for 95% of all samples
•If the sampling distribution is relatively symmetric and bell-shaped, a 95% confidence interval can be estimated using statistic ± 2 × SE (and we know 2xSE = ME)
•Misinterpretation 1: “A 95% confidence interval contains 95% of the data in the population”
•Misinterpretation 2: “I am 95% sure that the mean of a sample will fall within a 95% confidence interval for the mean”
•Misinterpretation 3: “The probability that the population parameter is in this particular 95% confidence interval is 0.95”
•To create a plausible range of values for a parameter:
o Take many random samples of the same size from the population, and compute the sample statistic for each sample
o Compute the standard error as the standard deviation of all these statistics
o Use statistic ± 2´SE