Statistics Key Concepts: A Comprehensive Guide

Measures of Central Tendency

Measures of central tendency are statistical metrics used to describe the center or typical value of a dataset. They summarize data distribution by identifying a single representative value. Common measures include mean, median, and mode.

Regression

Regression analyzes relationships between a dependent variable and one or more independent variables. It predicts the dependent variable based on independent variable values and assesses relationship strength.

Types of Regression

  • Linear Regression: Models a straight-line relationship.
  • Logistic Regression: Predicts probabilities of categorical outcomes.
  • Polynomial Regression: Models non-linear relationships.

Normal Distribution

A Normal Distribution (Gaussian distribution) is a symmetric, bell-shaped probability distribution. The mean, median, and mode are equal, with data clustered around the mean.

Standard Normal Distribution

A special case of normal distribution with a mean of 0 and a standard deviation of 1. Data is represented as z-scores, indicating standard deviations from the mean.

Measurement Scales

Measurement scales assign values to variables for analysis and comparison.

  • Nominal Scale: Categorizes items with names/labels (e.g., gender).
  • Ordinal Scale: Orders categories without equal intervals (e.g., satisfaction levels).
  • Interval Scale: Meaningful differences between values, no true zero (e.g., temperature).
  • Ratio Scale: Equal intervals and a true zero, enabling ratios (e.g., height).

Correlation

Correlation measures the relationship between two variables.

  • Positive Correlation: Variables increase/decrease together.
  • Negative Correlation: One variable increases as the other decreases.
  • Zero Correlation: No relationship.

The correlation coefficient (r) ranges from -1 (perfect negative) to +1 (perfect positive).

Probability

Probability quantifies the likelihood of an event (0 to 1). It’s calculated as favorable outcomes divided by total possible outcomes.

Discrete and Continuous Random Variables

Discrete: Countable distinct values (e.g., die roll).

Continuous: Infinite values within a range (e.g., height).

Poisson Distribution

Models the probability of event occurrences within a fixed interval, assuming independence and a constant average rate.

Estimation

Approximating a value based on available information when exact data is unavailable.

Skewness and Kurtosis

Skewness: Measures distribution asymmetry (left/right skew).

Kurtosis: Measures distribution “tailedness” (outliers).