Introduction to Probability and Statistical Inference
Chapter 12-15: Probability
Definitions
– A phenomenon is random if individual outcomes are uncertain but there is a predictable behavior in a large number of repetitions (random = haphazard).
– The probability of any outcome is the proportion of times the outcome would occur in a very long (infinitely long) series of repetitions.
– Repetitions, or trials, are said to be independent if the outcome of one trial does not affect the outcome of another.
Operations on Events
– The intersection of two events A and B is the collection of outcomes that are in both A and B:
– Denoted A∩B and read as “A and B”.
– The union of two events A and B is the collection of outcomes that are in A or B (including events in both):
– Denoted A∪B and read as “A or B”.
– The complement of an event A is the collection of outcomes not in A:
– Denoted A and read as “not A”.
The General Addition Rule
General addition rule: P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Independent Events
– Events A and B are independent if knowing that one has occurred provides no information about whether the other will.
– E.G. Consider a random experiment involving 2 tosses of a fair coin
– Sample space is S = {HH, HT, TH, TT}
– All 4 outcomes in S have chance 1/4.
– Knowing H occurs on the 1st toss provides no info about whether H will occur in the 2nd toss.
– Let A = {HH, HT} be the event of a head on the 1st toss and B = {HH, TH} be the event of a head on the 2nd toss.
– The events A and B are independent.
General Multiplication Rule
– For 2 (not necessarily independent) events, the rule is: P(A∩B) = P(A|B)×P(B), or
P(A∩B) = P(B|A)×P(A)
– Read A|B as “A given B” and P(A|B) as “the conditional probability of A given B.”
– When A and B are independent,
– P(A|B) = P(A) and P(B|A) = P(B), so that
P(A ∩ B) = P(A) × P(B).
– The multiplication rule for independent events is thus a special case of the general multiplication rule.
Bayes’ Theorem and Partitioning the Sample Space
– Connects conditional probabilities:
P(A | B) = P(B | A)P(A) / P(B)
– Writing B as {A ∩ B} ∪ {A ∩ B} and noticing that these two sets are disjoint (see previous diagram) means that we can write P(B) = P(A ∩ B) + P(A ∩ B).
– Applying the general multiplication rule to both terms in the sum for P(B), we obtain
P(B) = P(B | A)P(A) + P(B | A)P(A).
– Which means an alternate form of Bayes’ Theorem is
P(A | B) = P(B | A)P(A) / (P(B | A)P(A) + P(B | A)P(A))
Chapter 20-21: Inference for a Population Mean
Review of Key Terminology
– Parameter: a number that describes a distribution, or the relationship between two variables.
– Statistic: a quantity that can be computed from data (does not depend on unknown parameters).
– Sampling Distribution: the distribution of a statistic that we’d observe if we sampled repeatedly.
– Null/Alternative Hypothesis (H0, Ha): Hypothesized values of a parameter that we are interested in testing.
– Probability of an Outcome: the proportion of times the outcome of a random phenomenon would occur if we sampled repeatedly.
Inference for a Population Mean
– When σ is unknown we estimate it with the sample SD s.
– The sampling distribution of X̄ is N(μ,σ/√n). Since we don’t know σ/√n we estimate it with s/√n.
– When we estimate the standard deviation of the sampling distribution of a statistic we call it the standard error. Thus s/√n is the standard error of X̄.
– When we knew σ, confidence intervals and tests for μ were based on the “pivotal quantity”
Z = (X̄ − μ) / (σ/√n) ~ N(0,1).
– The natural thing to do when we don’t know σ is use
T = (X̄ − μ) / (s/√n).
– Question: What is the sampling distribution of T?
Sampling Distribution of T
– When we replace σ by s, we tend to get slightly more extreme observations in T than Z because of the extra variation in s.
– How much more extreme is determined by n.
– When n is large, s is very close to σ and the sampling distribution of T is very close to the standard normal.
– The distribution of T based on n observations is called the t distribution with n − 1 degrees of freedom (d.f.). The d.f. is from the degrees of freedom in the statistic s.
t Confidence Intervals
– Now that we can characterize the sampling distribution of T we can construct a confidence interval as before:
– For a level C confidence interval (CI) for the mean based on n observations we find the p = (1 − C)/2 critical value of the t distribution with n degrees of freedom, call it t* (t*n−1,p).
– C% of t statistics will fall between −t* and t*; that is,
−t* ≤ (x̄ − μ) / (s/√n) ≤ t* C% of the time … a little algebra …
μ − t*(s/√n) ≤ x̄ ≤ μ + t*(s/√n) C% of the time
– So
x̄ ± t*(s/√n)
will cover μ C% of the time.
Conditions for Inference about Two Population Means
– The data are independent SRSs of size n1 and n2 from two populations.
– The two populations can be populations of individuals under two treatments. Then we assume the data are from a randomized experiment (subjects randomly assigned to treatments) with one factor having two treatments.
– Both populations distributions are normal with unknown means μ1 and μ2 and unknown SDs σ1 and σ2, respectively.
– Use the sample size conditions for one-sample inference with the sample size n = n1 + n2; i.e.,
– n < 15: use t only if the data distribution is roughly symmetric and unimodal without outliers
– 15 ≤ n < 40: Use t except in the presence of outliers or strong skewness
– n ≥ 40: Use t even for skewed distributions.
Overview of Inference
– Let x1 and x2 be the variables measured from the two populations.
– We want to compare the two populations by either a confidence interval for μ1 − μ2 or a test of H0: μ1 − μ2 = 0.
– Inference is based on the difference in sample means, x̄1 − x̄2, and the sampling distribution.
Sampling Distribution
– We need the sampling distribution of X̄1 − X̄2 under our assumptions for inference.
– The x1i’s are from a population with distribution N(μ1, σ1) and the x2i’s are from a population with distribution N(μ2, σ2).
– Then X̄1 − X̄2 has variance σ21/n1 + σ22/n2 and hence SD √σ21/n1 + σ22/n2. (Variances add not s.d.’s.)
– We can standardize to a standard normal:
Z = ((X̄1 − X̄2) − (μ1 − μ2)) / √σ21/n1 + σ22/n2
Estimated SDs
– If σ1 and σ2 are both unknown, substitute s1 and s2 to obtain the standard error (SE) √s21/n1 + s22/n2 and the pivotal quantity
– Form is (estimated difference – true difference)/SE
– What is the sampling distribution of T?
– We can work around the problem by fudging the d.f.
– Option 1: (preferred) Get a computer to estimate the d.f. – see output from t.test().
– Option 2: Take the d.f. equal to the smaller of n1 − 1 and n2 − 1.
Hypothesis Tests
– For a value μ0 of μ1 − μ2 we have the t statistic
T = ((x̄1 − x̄2) − μ0) / √s21/n1 + s22/n2
– (estimated difference – hypothesized difference)/SE
– Compare to the t distribution with estimated d.f. to obtain a p-value to summarize the evidence against the null hypothesis in favor of the alternative hypothesis.
– p < 0.001 very strong
– 0.001 < p < 0.01 strong
– 0.01 < p < 0.05 good
– 0.05 < p < 0.1 some
– p > 0.1 little
Confidence Intervals
– From the pivotal quantity we can derive a confidence interval of the usual form estimated difference ± margin of error, where
– estimated difference is x̄1 − x̄2
– margin of error is critical value × SE