Statistical Inference: Sampling and Confidence Intervals
Chapter 9: Sampling Distributions
Quantile-Quantile Plot (QQ-Plot)
Empirical Rule: This property states that approximately 68%, 95%, and 99.7% of data falls within 1, 2, and 3 standard deviations of the mean, respectively.
Standard Normal Distribution
The Standard Normal distribution has a mean of 0 and a standard deviation of 1.
Example: If you want to know the percentage of babies that weigh less than 95 ounces at birth, you must first convert the value 95 to a standardized score (STAT).
Based
Read MoreStatistical Analysis and Predictive Modeling in Excel
Descriptive Statistics and Central Tendency
Descriptive statistics are the numbers that summarize a dataset, giving you a quick “snapshot” of its typical values and how much they vary. These are divided into Measures of Central Tendency (the middle) and Measures of Dispersion (the spread).
1. Measures of Central Tendency
These identify the “center” of your data where most values congregate.
- Mean (Average): The sum of all values divided by the total count. It is the most common measure but is highly
Statistics Concepts: Variables, Distributions, and Inference
Lesson 1: Variables
- Explanatory Variable – aka Independent Variable; explains variations in the response variable (x-axis). This is the predictor.
- Example: “Can quiz scores be used to predict exam scores?” (Explanatory = Quiz scores)
- Response Variable – aka Dependent Variable; its value is predicted or its variation is explained by the explanatory variable (y-axis). This is the outcome.
Lesson 2: Variable Types and Data Visualization
- Categorical vs. Quantitative Variables
- Categorical Variables = names,
Essential Causal Inference and Econometrics Techniques
Randomized Experiments and Causal Inference
Why are randomized experiments so desirable?
Randomization breaks the link between treatment assignment and confounders, making treated and untreated groups exchangeable. This guarantees unbiased estimates of causal effects (on average) because any differences in outcomes can be attributed to the treatment rather than selection.
Why might we not be able to run a randomized experiment?
They may be unethical (e.g., denying beneficial treatments), infeasible
Essential Data Science Concepts and Statistical Methods
Data Science Fundamentals
Data Science combines statistics, computer science, and domain knowledge to extract insights from data. The main goal is to uncover hidden patterns, trends, and other valuable information from large datasets to make informed, data-driven decisions. It deals with both structured (e.g., Excel tables) and unstructured (e.g., text, images) data.
The Data Science Lifecycle
- Problem Definition: Understanding the business question.
- Data Collection: Gathering data from various sources.
Statistical Sampling Distributions and Inference Exercises
Review Exercises for Sampling Distributions
8.56 Consider the data displayed in Exercise 1.20 on page 31. Construct a box-and-whisker plot and comment on the nature of the sample. Compute the sample mean and sample standard deviation.
8.57 If X1, X2, …, Xn are independent random variables having identical exponential distributions with parameter θ, show that the density function of the random variable Y = X1 + X2 + … + Xn is that of a gamma distribution with parameters α = n and β = θ.
8.58
Read More