Introduction to Probability and Statistical Inference

Posted on Jul 24, 2024 in Mathematics

Chapter 12-15: Probability

Definitions

– A phenomenon is random if individual outcomes are uncertain but there is a predictable behavior in a large number of repetitions (random = haphazard).

– The probability of any outcome is the proportion of times the outcome would occur in a very long (infinitely long) series of repetitions.

– Repetitions, or trials, are said to be independent if the outcome of one trial does not affect the outcome of another.

Operations on Events

– The intersection of two events A and B is the collection of outcomes that are in both A and B:

– Denoted A∩B and read as “A and B”.

– The union of two events A and B is the collection of outcomes that are in A or B (including events in both):

– Denoted A∪B and read as “A or B”.

– The complement of an event A is the collection of outcomes not in A:

– Denoted A and read as “not A”.

The General Addition Rule

General addition rule: P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Independent Events

– Events A and B are independent if knowing that one has occurred provides no information about whether the other will.

– E.G. Consider a random experiment involving 2 tosses of a fair coin

– Sample space is S = {HH, HT, TH, TT}

– All 4 outcomes in S have chance 1/4.

– Knowing H occurs on the 1st toss provides no info about whether H will occur in the 2nd toss.

– Let A = {HH, HT} be the event of a head on the 1st toss and B = {HH, TH} be the event of a head on the 2nd toss.

– The events A and B are independent.

General Multiplication Rule

– For 2 (not necessarily independent) events, the rule is: P(A∩B) = P(A|B)×P(B), or

P(A∩B) = P(B|A)×P(A)

– Read A|B as “A given B” and P(A|B) as “the conditional probability of A given B.”

– When A and B are independent,

– P(A|B) = P(A) and P(B|A) = P(B), so that

P(A ∩ B) = P(A) × P(B).

– The multiplication rule for independent events is thus a special case of the general multiplication rule.

Bayes’ Theorem and Partitioning the Sample Space

– Connects conditional probabilities:

P(A | B) = P(B | A)P(A) / P(B)

– Writing B as {A ∩ B} ∪ {A ∩ B} and noticing that these two sets are disjoint (see previous diagram) means that we can write P(B) = P(A ∩ B) + P(A ∩ B).

– Applying the general multiplication rule to both terms in the sum for P(B), we obtain

P(B) = P(B | A)P(A) + P(B | A)P(A).

– Which means an alternate form of Bayes’ Theorem is

P(A | B) = P(B | A)P(A) / (P(B | A)P(A) + P(B | A)P(A))

Chapter 20-21: Inference for a Population Mean

Review of Key Terminology

– Parameter: a number that describes a distribution, or the relationship between two variables.

– Statistic: a quantity that can be computed from data (does not depend on unknown parameters).

– Sampling Distribution: the distribution of a statistic that we’d observe if we sampled repeatedly.

– Null/Alternative Hypothesis (H₀, H_a): Hypothesized values of a parameter that we are interested in testing.

– Probability of an Outcome: the proportion of times the outcome of a random phenomenon would occur if we sampled repeatedly.

Inference for a Population Mean

– When σ is unknown we estimate it with the sample SD s.

– The sampling distribution of X̄ is N(μ,σ/√n). Since we don’t know σ/√n we estimate it with s/√n.

– When we estimate the standard deviation of the sampling distribution of a statistic we call it the standard error. Thus s/√n is the standard error of X̄.

– When we knew σ, confidence intervals and tests for μ were based on the “pivotal quantity”

Z = (X̄ − μ) / (σ/√n) ~ N(0,1).

– The natural thing to do when we don’t know σ is use

T = (X̄ − μ) / (s/√n).

– Question: What is the sampling distribution of T?

Sampling Distribution of T

– When we replace σ by s, we tend to get slightly more extreme observations in T than Z because of the extra variation in s.

– How much more extreme is determined by n.

– When n is large, s is very close to σ and the sampling distribution of T is very close to the standard normal.

– The distribution of T based on n observations is called the t distribution with n − 1 degrees of freedom (d.f.). The d.f. is from the degrees of freedom in the statistic s.

t Confidence Intervals

– Now that we can characterize the sampling distribution of T we can construct a confidence interval as before:

– For a level C confidence interval (CI) for the mean based on n observations we find the p = (1 − C)/2 critical value of the t distribution with n degrees of freedom, call it t^* (t^*_n−1,p).

– C% of t statistics will fall between −t^* and t^*; that is,

−t^* ≤ (x̄ − μ) / (s/√n) ≤ t^* C% of the time … a little algebra …

μ − t^*(s/√n) ≤ x̄ ≤ μ + t^*(s/√n) C% of the time

– So

x̄ ± t^*(s/√n)

will cover μ C% of the time.

Conditions for Inference about Two Population Means

– The data are independent SRSs of size n₁ and n₂ from two populations.

– The two populations can be populations of individuals under two treatments. Then we assume the data are from a randomized experiment (subjects randomly assigned to treatments) with one factor having two treatments.

– Both populations distributions are normal with unknown means μ₁ and μ₂ and unknown SDs σ₁ and σ₂, respectively.

– Use the sample size conditions for one-sample inference with the sample size n = n₁ + n₂; i.e.,

– n < 15: use t only if the data distribution is roughly symmetric and unimodal without outliers

– 15 ≤ n < 40: Use t except in the presence of outliers or strong skewness

– n ≥ 40: Use t even for skewed distributions.

Overview of Inference

– Let x₁ and x₂ be the variables measured from the two populations.

– We want to compare the two populations by either a confidence interval for μ₁ − μ₂ or a test of H₀: μ₁ − μ₂ = 0.

– Inference is based on the difference in sample means, x̄₁ − x̄₂, and the sampling distribution.

Sampling Distribution

– We need the sampling distribution of X̄₁ − X̄₂ under our assumptions for inference.

– The x₁_i’s are from a population with distribution N(μ₁, σ₁) and the x₂_i’s are from a population with distribution N(μ₂, σ₂).

– Then X̄₁ − X̄₂ has variance σ²₁/n₁ + σ²₂/n₂ and hence SD √σ²₁/n₁ + σ²₂/n₂. (Variances add not s.d.’s.)

– We can standardize to a standard normal:

Z = ((X̄₁ − X̄₂) − (μ₁ − μ₂)) / √σ²₁/n₁ + σ²₂/n₂

Estimated SDs

– If σ₁ and σ₂ are both unknown, substitute s₁ and s₂ to obtain the standard error (SE) √s²₁/n₁ + s²₂/n₂ and the pivotal quantity

– Form is (estimated difference – true difference)/SE

– What is the sampling distribution of T?

– We can work around the problem by fudging the d.f.

– Option 1: (preferred) Get a computer to estimate the d.f. – see output from t.test().

– Option 2: Take the d.f. equal to the smaller of n₁ − 1 and n₂ − 1.

Hypothesis Tests

– For a value μ₀ of μ₁ − μ₂ we have the t statistic

T = ((x̄₁ − x̄₂) − μ₀) / √s²₁/n₁ + s²₂/n₂

– (estimated difference – hypothesized difference)/SE

– Compare to the t distribution with estimated d.f. to obtain a p-value to summarize the evidence against the null hypothesis in favor of the alternative hypothesis.

– p < 0.001 very strong

– 0.001 < p < 0.01 strong

– 0.01 < p < 0.05 good

– 0.05 < p < 0.1 some

– p > 0.1 little

Confidence Intervals

– From the pivotal quantity we can derive a confidence interval of the usual form estimated difference ± margin of error, where

– estimated difference is x̄₁ − x̄₂

– margin of error is critical value × SE