Introduction to Probability and Statistical Inference

Chapter 12-15: Probability

Definitions

– A phenomenon is random if individual outcomes are uncertain but there is a predictable behavior in a large number of repetitions (random = haphazard).

– The probability of any outcome is the proportion of times the outcome would occur in a very long (infinitely long) series of repetitions.

– Repetitions, or trials, are said to be independent if the outcome of one trial does not affect the outcome of another.

Operations on Events

– The intersection of two events A and B is the collection of outcomes that are in both A and B:

    – Denoted AB and read as “A and B”.

– The union of two events A and B is the collection of outcomes that are in A or B (including events in both):

    – Denoted AB and read as “A or B”.

– The complement of an event A is the collection of outcomes not in A:

    – Denoted A and read as “not A”.

The General Addition Rule

General addition rule: P(AB) = P(A) + P(B) − P(AB)

Independent Events

– Events A and B are independent if knowing that one has occurred provides no information about whether the other will.

– E.G. Consider a random experiment involving 2 tosses of a fair coin

    – Sample space is S = {HH, HT, TH, TT}

    – All 4 outcomes in S have chance 1/4.

– Knowing H occurs on the 1st toss provides no info about whether H will occur in the 2nd toss.

– Let A = {HH, HT} be the event of a head on the 1st toss and B = {HH, TH} be the event of a head on the 2nd toss.

– The events A and B are independent.

General Multiplication Rule

– For 2 (not necessarily independent) events, the rule is: P(AB) = P(A|BP(B), or

P(AB) = P(B|AP(A)

– Read A|B as “A given B” and P(A|B) as “the conditional probability of A given B.”

– When A and B are independent,

    – P(A|B) = P(A) and P(B|A) = P(B), so that

P(AB) = P(A) × P(B).

– The multiplication rule for independent events is thus a special case of the general multiplication rule.

Bayes’ Theorem and Partitioning the Sample Space

– Connects conditional probabilities:

P(A | B) = P(B | A)P(A) / P(B)

– Writing B as {AB} ∪ {AB} and noticing that these two sets are disjoint (see previous diagram) means that we can write P(B) = P(AB) + P(AB).

– Applying the general multiplication rule to both terms in the sum for P(B), we obtain

P(B) = P(B | A)P(A) + P(B | A)P(A).

– Which means an alternate form of Bayes’ Theorem is

P(A | B) = P(B | A)P(A) / (P(B | A)P(A) + P(B | A)P(A))

Chapter 20-21: Inference for a Population Mean

Review of Key Terminology

Parameter: a number that describes a distribution, or the relationship between two variables.

Statistic: a quantity that can be computed from data (does not depend on unknown parameters).

Sampling Distribution: the distribution of a statistic that we’d observe if we sampled repeatedly.

Null/Alternative Hypothesis (H0, Ha): Hypothesized values of a parameter that we are interested in testing.

Probability of an Outcome: the proportion of times the outcome of a random phenomenon would occur if we sampled repeatedly.

Inference for a Population Mean

– When σ is unknown we estimate it with the sample SD s.

– The sampling distribution of is N(μ,σ/√n). Since we don’t know σ/√n we estimate it with s/√n.

– When we estimate the standard deviation of the sampling distribution of a statistic we call it the standard error. Thus s/√n is the standard error of .

– When we knew σ, confidence intervals and tests for μ were based on the “pivotal quantity”

Z = ( − μ) / (σ/√n) ~ N(0,1).

– The natural thing to do when we don’t know σ is use

T = ( − μ) / (s/√n).

Question: What is the sampling distribution of T?

Sampling Distribution of T

– When we replace σ by s, we tend to get slightly more extreme observations in T than Z because of the extra variation in s.

– How much more extreme is determined by n.

– When n is large, s is very close to σ and the sampling distribution of T is very close to the standard normal.

– The distribution of T based on n observations is called the t distribution with n − 1 degrees of freedom (d.f.). The d.f. is from the degrees of freedom in the statistic s.

t Confidence Intervals

– Now that we can characterize the sampling distribution of T we can construct a confidence interval as before:

– For a level C confidence interval (CI) for the mean based on n observations we find the p = (1 − C)/2 critical value of the t distribution with n degrees of freedom, call it t* (t*n−1,p).

C% of t statistics will fall between −t* and t*; that is,

t* ≤ ( − μ) / (s/√n) ≤ t* C% of the time … a little algebra …

μ − t*(s/√n) ≤ ≤ μ + t*(s/√n) C% of the time

– So

± t*(s/√n)

will cover μ C% of the time.

Conditions for Inference about Two Population Means

– The data are independent SRSs of size n1 and n2 from two populations.

– The two populations can be populations of individuals under two treatments. Then we assume the data are from a randomized experiment (subjects randomly assigned to treatments) with one factor having two treatments.

– Both populations distributions are normal with unknown means μ1 and μ2 and unknown SDs σ1 and σ2, respectively.

– Use the sample size conditions for one-sample inference with the sample size n = n1 + n2; i.e.,

    – n < 15: use t only if the data distribution is roughly symmetric and unimodal without outliers

    – 15 ≤ n < 40: Use t except in the presence of outliers or strong skewness

    – n ≥ 40: Use t even for skewed distributions.

Overview of Inference

– Let x1 and x2 be the variables measured from the two populations.

– We want to compare the two populations by either a confidence interval for μ1 − μ2 or a test of H0: μ1 − μ2 = 0.

– Inference is based on the difference in sample means, 12, and the sampling distribution.

Sampling Distribution

– We need the sampling distribution of 12 under our assumptions for inference.

– The x1i’s are from a population with distribution N1, σ1) and the x2i’s are from a population with distribution N2, σ2).

– Then 12 has variance σ21/n1 + σ22/n2 and hence SD √σ21/n1 + σ22/n2. (Variances add not s.d.’s.)

– We can standardize to a standard normal:

Z = ((12) − (μ1 − μ2)) / √σ21/n1 + σ22/n2

Estimated SDs

– If σ1 and σ2 are both unknown, substitute s1 and s2 to obtain the standard error (SE) √s21/n1 + s22/n2 and the pivotal quantity

– Form is (estimated difference – true difference)/SE

– What is the sampling distribution of T?

– We can work around the problem by fudging the d.f.

Option 1: (preferred) Get a computer to estimate the d.f. – see output from t.test().

Option 2: Take the d.f. equal to the smaller of n1 − 1 and n2 − 1.

Hypothesis Tests

– For a value μ0 of μ1 − μ2 we have the t statistic

T = ((12) − μ0) / √s21/n1 + s22/n2

– (estimated difference – hypothesized difference)/SE

– Compare to the t distribution with estimated d.f. to obtain a p-value to summarize the evidence against the null hypothesis in favor of the alternative hypothesis.

p < 0.001 very strong

– 0.001 < p < 0.01 strong

– 0.01 < p < 0.05 good

– 0.05 < p < 0.1 some

p > 0.1 little

Confidence Intervals

– From the pivotal quantity we can derive a confidence interval of the usual form estimated difference ± margin of error, where

    – estimated difference is 12

    – margin of error is critical value × SE