Probability and Statistics: Formulas and Concepts

Counting

  • Multiplication Rule: There are n1n2 . . . nr possible outcomes if an experiment has r steps and the ith step has ni outcomes.
  • Ordered, Without Replacement: There are n(n−1)(n−2)…(n−k + 1) = n! / (n−k)! ways to choose k items out of n without replacement if the order in which the k items are chosen matters (e.g., out of 10 people, choose a president, vice president, and treasurer).
  • Unordered, Without Replacement: There are n! / (n−k)!k! ways to choose k items out of n without replacement if the order does not matter (e.g., out of 10 people, choose a team of 3).
  • Ordered, With Replacement: There are nk ways to choose k items out of n with replacement if the order matters (e.g., roll 8 dice and place them in a line: n = 6, k = 8).
  • Unordered, With Replacement: There are (n+k−1)! / k!(n-1)! ways to choose k items out of n with replacement if the order does not matter (e.g., shake 8 dice in a cup and pour them onto the table).

Probability

  • Naive Definition: P(A) = (number of outcomes favorable to A) / (number of outcomes)
  • Conditional Probability: P(A|B) = P(A ∩ B) / P(B) and, in general, P(A|B) ≠ P(B|A).
  • Bayes’ Rule: P(A|B) = P(B|A)P(A) / P(B) = P(B|A)P(A) / [P(B|A)P(A) + P(B|¬A)P(¬A)]

Random Variables

  • Let X be a discrete random variable. Then, E[X] = Σx xP(X = x)
  • Linearity: Let X, Y be random variables and let a be a fixed constant. Then, E[aX + Y] = aE[X] + E[Y]
  • Let X be a discrete random variable. Then, Var(X) = E[(X − E[X])2] = E[X2] − E[X]2
  • Variance is NOT linear. Let X, Y be random variables and let a be a fixed constant. Then, Var(X + Y) ≠ Var(X) + Var(Y) (unless X and Y are independent). Var(aX) = a2Var(X)

Bernoulli Distribution

  • X ∼ Bern(p) and has probability mass function (PMF) P(X = 1) = p and P(X = 0) = 1 − p.
  • Let X be the outcome of one experiment that results in either success (with probability p) or failure (with probability 1 − p). Then X ∼ Bern(p).
  • Mean: E[X] = p

Binomial Distribution

  • X ∼ Bin(n, p) and has PMF P(X = k) = (n choose k) pk (1 − p)n−k for k ∈ {0, 1, . . . , n}
  • Let X be the number of successes out of n independent Bernoulli experiments, each with probability p of success. Then X ∼ Bin(n, p).
  • Mean: E[X] = np

Geometric Distribution

  • X ∼ Geom(p) and has PMF P(X = k) = p(1 − p)k for k ∈ {0, 1, 2, . . . }
  • Let X be the number of failures before the first success in a sequence of Bernoulli experiments, each with probability p of success. Then X ∼ Geom(p).
  • Mean: E[X] = (1−p) / p

Hypergeometric Distribution

  • X ∼ HGeom(w, b, n) and has PMF P(X = k) = [(w choose k)(b choose n−k)] / (w+b choose n) for k ∈ {0, 1, 2, . . . }
  • Let X be the number of “successful” units drawn out of n draws without replacement from a population of w + b units, where w is the number of “successful” units. Then X ∼ HGeom(w, b, n).
  • Mean: E[X] = nw / (w+b)

Poisson Distribution

  • X ∼ Pois(λ) and has PMF P(X = k) = (λk / k!) e−λ
  • Let X be the count of a particular rare event for a set period of time, say an hour, which has an expectation of λ. Then X ∼ Pois(λ).
  • Mean: E[X] = λ.

Negative Binomial Distribution

  • X ∼ NBin(r, p) and has PMF P(X = n) = (n + r − 1 choose r − 1) (1 − p)n pr
  • Let X be the number of failures before the first r successes when the probability of success is p. Then X ∼ NBin(r, p).
  • Mean: E[X] = r(1−p) / p.

Fundamental Bridge

Let IA be the indicator of the occurrence of event A. Then the fundamental bridge is P(A) = E[IA].

Math

Σn=0 an = 1 / (1 − a) for |a| < 1

Inequalities

  1. Cauchy-Schwarz: Let X and Y be random variables with finite variance. |E[XY]| ≤ √{E[X2]E[Y2]}
  2. Markov: Let X be a random variable and a > 0. P(|X| ≥ a) ≤ E[|X|] / a
  3. Chebyshev: Let X be a random variable with mean µ and variance σ2. P(|X − µ| ≥ a) ≤ σ2 / a2.

Law of Large Numbers (LLN)

Let X1, X2, X3, . . . be independent and identically distributed random variables drawn from some distribution with mean µ = E[Xi] for all i. Then limn→∞ (X1 + · · · + Xn) / n = µ

Markov Chain

  • A Markov chain X0, X1, X2, . . . on state space {1, . . . , M} satisfies the Markov condition: P(Xn+1 = j|X0 = i0, X1 = i1, . . . , Xn−1 = in−1, Xn = i) = P(Xn+1 = j|Xn = i)
  • Only today’s information matters for predicting tomorrow; additional information about the past is redundant given today’s information. Given the present, the past and future are conditionally independent.