Probability and Statistics: Formulas and Concepts
Counting
- Multiplication Rule: There are n1n2 . . . nr possible outcomes if an experiment has r steps and the ith step has ni outcomes.
- Ordered, Without Replacement: There are n(n−1)(n−2)…(n−k + 1) = n! / (n−k)! ways to choose k items out of n without replacement if the order in which the k items are chosen matters (e.g., out of 10 people, choose a president, vice president, and treasurer).
- Unordered, Without Replacement: There are n! / (n−k)!k! ways to choose k items out of n without replacement if the order does not matter (e.g., out of 10 people, choose a team of 3).
- Ordered, With Replacement: There are nk ways to choose k items out of n with replacement if the order matters (e.g., roll 8 dice and place them in a line: n = 6, k = 8).
- Unordered, With Replacement: There are (n+k−1)! / k!(n-1)! ways to choose k items out of n with replacement if the order does not matter (e.g., shake 8 dice in a cup and pour them onto the table).
Probability
- Naive Definition: P(A) = (number of outcomes favorable to A) / (number of outcomes)
- Conditional Probability: P(A|B) = P(A ∩ B) / P(B) and, in general, P(A|B) ≠ P(B|A).
- Bayes’ Rule: P(A|B) = P(B|A)P(A) / P(B) = P(B|A)P(A) / [P(B|A)P(A) + P(B|¬A)P(¬A)]
Random Variables
- Let X be a discrete random variable. Then, E[X] = Σx xP(X = x)
- Linearity: Let X, Y be random variables and let a be a fixed constant. Then, E[aX + Y] = aE[X] + E[Y]
- Let X be a discrete random variable. Then, Var(X) = E[(X − E[X])2] = E[X2] − E[X]2
- Variance is NOT linear. Let X, Y be random variables and let a be a fixed constant. Then, Var(X + Y) ≠ Var(X) + Var(Y) (unless X and Y are independent). Var(aX) = a2Var(X)
Bernoulli Distribution
- X ∼ Bern(p) and has probability mass function (PMF) P(X = 1) = p and P(X = 0) = 1 − p.
- Let X be the outcome of one experiment that results in either success (with probability p) or failure (with probability 1 − p). Then X ∼ Bern(p).
- Mean: E[X] = p
Binomial Distribution
- X ∼ Bin(n, p) and has PMF P(X = k) = (n choose k) pk (1 − p)n−k for k ∈ {0, 1, . . . , n}
- Let X be the number of successes out of n independent Bernoulli experiments, each with probability p of success. Then X ∼ Bin(n, p).
- Mean: E[X] = np
Geometric Distribution
- X ∼ Geom(p) and has PMF P(X = k) = p(1 − p)k for k ∈ {0, 1, 2, . . . }
- Let X be the number of failures before the first success in a sequence of Bernoulli experiments, each with probability p of success. Then X ∼ Geom(p).
- Mean: E[X] = (1−p) / p
Hypergeometric Distribution
- X ∼ HGeom(w, b, n) and has PMF P(X = k) = [(w choose k)(b choose n−k)] / (w+b choose n) for k ∈ {0, 1, 2, . . . }
- Let X be the number of “successful” units drawn out of n draws without replacement from a population of w + b units, where w is the number of “successful” units. Then X ∼ HGeom(w, b, n).
- Mean: E[X] = nw / (w+b)
Poisson Distribution
- X ∼ Pois(λ) and has PMF P(X = k) = (λk / k!) e−λ
- Let X be the count of a particular rare event for a set period of time, say an hour, which has an expectation of λ. Then X ∼ Pois(λ).
- Mean: E[X] = λ.
Negative Binomial Distribution
- X ∼ NBin(r, p) and has PMF P(X = n) = (n + r − 1 choose r − 1) (1 − p)n pr
- Let X be the number of failures before the first r successes when the probability of success is p. Then X ∼ NBin(r, p).
- Mean: E[X] = r(1−p) / p.
Fundamental Bridge
Let IA be the indicator of the occurrence of event A. Then the fundamental bridge is P(A) = E[IA].
Math
Σn=0∞ an = 1 / (1 − a) for |a| < 1
Inequalities
- Cauchy-Schwarz: Let X and Y be random variables with finite variance. |E[XY]| ≤ √{E[X2]E[Y2]}
- Markov: Let X be a random variable and a > 0. P(|X| ≥ a) ≤ E[|X|] / a
- Chebyshev: Let X be a random variable with mean µ and variance σ2. P(|X − µ| ≥ a) ≤ σ2 / a2.
Law of Large Numbers (LLN)
Let X1, X2, X3, . . . be independent and identically distributed random variables drawn from some distribution with mean µ = E[Xi] for all i. Then limn→∞ (X1 + · · · + Xn) / n = µ
Markov Chain
- A Markov chain X0, X1, X2, . . . on state space {1, . . . , M} satisfies the Markov condition: P(Xn+1 = j|X0 = i0, X1 = i1, . . . , Xn−1 = in−1, Xn = i) = P(Xn+1 = j|Xn = i)
- Only today’s information matters for predicting tomorrow; additional information about the past is redundant given today’s information. Given the present, the past and future are conditionally independent.