Statistical Analysis: Variables, Data, and Inference

Variables and Study Groups

  • Categorical variables
  • Quantitative variables
  • Explanatory variable
  • Response variable

Study Groups –> Population

                                     –> sample

Sampling and Data Collection

Sample:

  • Statistical Inference
  • Sampling Bias
  • Random Sample
  • Association vs. Causation
  • Confounding Variables

Collecting Data:

  • Experiment
  • Observational Study
  • Randomized Experiment
  • Control Group
  • Placebo
  • Blind Experiment
  • Double-Blind Experiment
  • Randomized Comparative Experiment
  • Matched Pairs

Describing Data

  • Standard Deviation
  • 95% Rule
  • Z-Score
  • 5-Number-Summary
  • Range
  • IQR
  • Boxplot
  • Side-by-side plots
  • Scatterplot
  • Direction of Association
  • Linear Correlation
  • Correlation
  • Regression Equation
  • Prediction
  • Residual
  • Causation
  • Simpson’s Paradox
  • Augmented Scatterplot
  • Visualization w/ 2 or more variables

B8cW4yMilNjwAAAAAElFTkSuQmCC

MaKHUXa2xIkiRJkqSuZWBDkiRJkiR1LQMbkiRJkiSpa9nHhiRJ6pUOU1tEJ3bd2nGiJDXDa6DUnayxIUmSJEmSupY1NiRJkiRJUteyxoYkSZIkSepaBjYkSZIkSVLXMrAhSZIkSZK6loENSZIkSZLUtQxsSJIkSZKkrmVgQ5IkSZIkdS0DG5IkSZIkqWsZ2JAkSZIkSV3LwIYkSZIkSepaBjYkSZIkSVLXMrAhSZIkSZK6loENSZIkSZLUtQxsSJIkSZKkrmVgQ5IkSZIkdS0DG5IkSZIkqWsZ2JAkSZIkSV3LwIYkSZIkSepaBjYkSZIkSVLXMrAhSZIkSZK6loENSZIkSZLUtQxsSJIkSZKkrmVgQ5IkSZIkdS0DG5IkSZIkqWsZ2JAkSZIkSV3LwIYkSZIkSepaBjYkSZIkSVLXMrAhSZIkSZK6loENSZIkSZLUtQxsSJIkSZKkrmVgQ5IkSZIkdS0DG5IkSZIkqWsZ2JAkSZIkSV3LwIYkSZIkSepaBjYkSZIkSVLXMrAhSZIkSZK6loENSZIkSZLUtQxsSJIkSZKkrmVgQ5IkSZIkdS0DG5IkSZIkqUtl2f8DhmxeKFRalx4AAAAASUVORK5CYII=

3.1 Sampling Distributions

Sample statistics vary from sample to sample
Sampling Distribution shows how much the sample statistic varies from sample to sample
If samples are randomly selected, the sampling distribution will be centered around the population parameter
For most of the statistics we consider, if the sample size is large enough, the sampling distribution will be symmetrical and bell-shaped
If you take random samples of the same size, from the same population, the sampling distribution will be centered around the true population parameter
If sampling bias exists, your sampling distribution can give you bad information about the true parameter
Variability of the statistic……
Standard Error is the standard deviation of the sample statistic
As the sample size increases, the variability of the sample statistics tends to decrease and the sample statistics tend to be closerto the true value of the population parameter!
For larger sample sizes, you get less variability in the statistics, so less uncertainty in your estimates 
Statistical inference is drawing conclusions about a population based on a sample
We use a sample statistic to estimate a population parameter
To assess the uncertainty of a statistic, we need to know how much it varies from sample to sample
To create a sampling distribution, take many samples of the same size from the population, and compute the statistic for each
Standard error is the standard deviation of a statistic

3.2 Confidence Intervals

The larger the standard deviation of the sampling distribution, the greater the spread in the distribution of sample statistics.  That means that there is a high uncertainty surrounding any single statistic and our margin of error will be large.
A confidence intervalfor a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples
The success rate (proportion of all samples whose intervals contain the parameter) is known as the confidence level
A 95% confidence interval will contain the true parameter for 95% of all samples
If the sampling distribution is relatively symmetric and bell-shaped, a 95% confidence interval can be estimated using statistic ± 2 × SE (and we know 2xSE = ME)
Misinterpretation 1: “A 95% confidence interval contains 95% of the data in the population”
Misinterpretation 2: “I am 95% sure that the mean of a sample will fall within a 95% confidence interval for the mean”
Misinterpretation 3: “The probability that the population parameter is in this particular 95% confidence interval is 0.95”
To create a plausible range of values for a parameter:
o Take many random samples of the same size from the population, and compute the sample statistic for each sample
o Compute the standard error as the standard deviation of all these statistics
o Use statistic ± 2´SE