Statistics Key Concepts: A Comprehensive Guide
Measures of Central Tendency
Measures of central tendency are statistical metrics used to describe the center or typical value of a dataset. They summarize data distribution by identifying a single representative value. Common measures include mean, median, and mode.
Regression
Regression analyzes relationships between a dependent variable and one or more independent variables. It predicts the dependent variable based on independent variable values and assesses relationship strength.
Types of Regression
- Linear Regression: Models a straight-line relationship.
- Logistic Regression: Predicts probabilities of categorical outcomes.
- Polynomial Regression: Models non-linear relationships.
Normal Distribution
A Normal Distribution (Gaussian distribution) is a symmetric, bell-shaped probability distribution. The mean, median, and mode are equal, with data clustered around the mean.
Standard Normal Distribution
A special case of normal distribution with a mean of 0 and a standard deviation of 1. Data is represented as z-scores, indicating standard deviations from the mean.
Measurement Scales
Measurement scales assign values to variables for analysis and comparison.
- Nominal Scale: Categorizes items with names/labels (e.g., gender).
- Ordinal Scale: Orders categories without equal intervals (e.g., satisfaction levels).
- Interval Scale: Meaningful differences between values, no true zero (e.g., temperature).
- Ratio Scale: Equal intervals and a true zero, enabling ratios (e.g., height).
Correlation
Correlation measures the relationship between two variables.
- Positive Correlation: Variables increase/decrease together.
- Negative Correlation: One variable increases as the other decreases.
- Zero Correlation: No relationship.
The correlation coefficient (r) ranges from -1 (perfect negative) to +1 (perfect positive).
Probability
Probability quantifies the likelihood of an event (0 to 1). It’s calculated as favorable outcomes divided by total possible outcomes.
Discrete and Continuous Random Variables
Discrete: Countable distinct values (e.g., die roll).
Continuous: Infinite values within a range (e.g., height).
Poisson Distribution
Models the probability of event occurrences within a fixed interval, assuming independence and a constant average rate.
Estimation
Approximating a value based on available information when exact data is unavailable.
Skewness and Kurtosis
Skewness: Measures distribution asymmetry (left/right skew).
Kurtosis: Measures distribution “tailedness” (outliers).