Statistics Problems and Solutions

1. Blood Cholesterol Analysis

Blood cholesterol was measured on a random sample of 24 people:

226, 189, 240, 192, 216, 207, 221, 255, 190, 237, 219, 182, 209, 198, 243, 202, 234, 194, 213, 185, 193, 223, 204, 191

i) Stem-and-Leaf Plots

The first 18 observations have been entered in the stem-and-leaf plot on the left. The completed plot is shown on the right.

Left Plot (Original Entries in Bold) | Right Plot (Ordered)

25 | 5 | 25 | 5

24 | 0 3 | 24 | 0 3

23 | 7 4 | 23 | 4 7

22 | 6 1 3 | 22 | 1 3 6

21 | 6 9 3 | 21 | 3 6 9

20 | 7 9 2 4 | 20 | 2 4 7 9

19 | 2 0 8 4 3 1 | 19 | 0 1 2 3 4 8

18 | 9 2 5 | 18 | 2 5 9

ii) Pie Chart of Cholesterol Categories

Cholesterol values are categorized as follows:

  • Desirable: Less than 200
  • Borderline Risk: 200-239 (inclusive)
  • High Risk: 240 and above

Angle calculations for the pie chart:

  • Desirable: (9/24)(360°) = 135°
  • Borderline Risk: (12/24)(360°) = 180°
  • High Risk: (3/24)(360°) = 45°

rR1ZJWE4LuA9N+DtIyE0CEYV5kV9VaIX8tlFZqIV

2. Emergency Department Visits

The number of emergency department (ED) visits on a weekend day is recorded for 25 randomly selected weekend days. The data are displayed in the stem-and-leaf plot below.

1 | 0 1 2 3 4 6 6 7 8

2 | 0 2 3 4 8

3 | 1 2 3 5 7

4 | 1 8

5 | 2 7

6 | 4

7 |

8 | 2

i) Positions of Median and Lower Quartile
  • Median Position: (25+1)/2 = 13 => 13th ordered observation
  • Lower Quartile Position: (25+1)/4 = 6.5 => Average of 6th & 7th ordered observation
ii) Five-Number Summary
  • Minimum: 10
  • Lower Quartile: (16+16)/2 = 16
  • Median: 24
  • Upper Quartile: (37+41)/2 = 39
  • Maximum: 82
iii) SAS Procedure

The statistics in part (ii) could have been obtained using proc univariate.

iv) Box Plot Comparison

Below is a box plot for weekday ED visits. The box plot for weekend ED visits is drawn below it using the same scale and the results from part (ii).

MIwHMMyPMM0XMM2fMM4nMM6LAlBAAA7

v) Comparison of Weekday and Weekend ED Visits
  1. The median number of visits is higher on weekend days than on weekdays.
  2. There is more variability in the number of ED visits on weekend days than on weekdays.
  3. Weekday ED visits have a symmetric distribution, while weekend ED visits have a positively skewed distribution.

3. Maze Solving Times

The time it takes to solve a maze is recorded for 4 “smart” rats. The results (in seconds) are: 146, 113, 31, 134

i) Sample Statistics
  • Mean: (146 + 113 + 31 + 134)/4 = 424/4 = 106
  • Median: (113 + 134)/2 = 123.5 (Data ordered: 31, 113, 134, 146)
  • Variance (s²): [(146-106)² + (113-106)² + (31-106)² + (134-106)²]/(4-1) = [40² + 7² + (-75)² + 28²]/3 = 7686/3 = 2562
  • Standard Deviation (s): √2562 ≈ 50.62

gif;base64,R0lGODlhEgAaAHcAMSH+GlNvZnR3Y 3wCDpJyz6AIIBilCIGVCHESIBdVZvumeKSZqmogD

ii) Impact of Corrected Observation

If the time of 31 seconds was a recording error and the actual time was 131, the sample mean would be larger, but the sample standard deviation would be smaller.

4. Post-Debate Poll

After the first presidential debate, 50 people were randomly selected and asked two questions: their political affiliation (Democrat or Republican) and who they thought won the debate (Bush or Kerry). The results are displayed in the table below.

DemocratRepublican
Bush421
Kerry169
i) Probability Calculations
  • P(Democrat & Thought Kerry Won): 16/50 = 0.32
  • P(Democrat or Thought Kerry Won): (20/50) + (25/50) – (16/50) = 29/50 = 0.58 OR 1 – P(Republican & Thought Bush Won) = 1 – 21/50 = 29/50 = 0.58
  • P(Thought Kerry Won | Democrat): 16/20 = 0.8
  • P(Republican | Thought Bush Won): 21/25 = 0.84
ii) Independence of Political Party and Debate Winner

No, who a person thought won does not appear to be independent of their political party.

iii) Probability of Two People Thinking Bush Won

(25/50)(24/49) = 600/2450 = 0.2449

5. Salad Bar Salmonella Exposure

A diner at a restaurant makes two trips to a salad bar contaminated with Salmonella bacteria. The probability of exposure on the first trip is 0.6. If exposed on the first trip, the probability of exposure on the second trip is 0.8. If not exposed on the first trip, the probability of exposure on the second trip is 0.3.

i) Tree Diagram

vKYz7zmmRMEADs=

ii) Probability of at Least One Exposure

P(Exposed at least once) = 1 – P(Not exposed) = 1 – 0.28 = 0.72

6. Screening Test Results

Results of a screening test on 100 subjects are given in the table below.

DiseaseNo Disease
Positive Screen484
Negative Screen1236
6040
i) Sensitivity and Specificity
  • Sensitivity: 48/60 = 0.8
  • Specificity: 36/40 = 0.9
ii) Positive Predictive Value (PPV)

PPV = (0.2)(0.8) / [(0.2)(0.8) + (0.8)(0.1)] = 0.16 / (0.16 + 0.08) = 0.16 / 0.24 = 0.6667

Of the people who test positive, two-thirds (66.67%) actually have the disease.

r0gOZxwMHLgGCERyCuhgoBjCxQTceLKD0IiRBE4h

iii) PPV with 10% Prevalence

If only 10% of the population has the disease, the PPV would be smaller.

iv) Increasing PPV

Increasing the specificity to 0.95 would increase the PPV more than increasing the sensitivity to 0.9.

v) Probability of Positive Screen Given No Disease

P(+ | No Disease) = 1 – P(- | No Disease) = 1 – Specificity = 1 – 0.9 = 0.1

vi) Probability of No Disease Given Positive Screen

P(No Disease | +) = 1 – P(Disease | +) = 1 – PPV = 1 – 0.6667 = 0.3333

7. Case-Control Study of Emphysema and Smoking

A case-control study uses 179 patients with emphysema and 338 controls without the disease. Smoking history is obtained for all subjects. The results are summarized below.

Table of Smoking by Emphysema

SmokingEmphysema (Yes)Emphysema (No)Total
Yes101 (56.42%)130 (38.46%)231 (44.68%)
No78 (43.58%)208 (61.54%)286 (55.32%)
Total179 (34.62%)338 (65.38%)517 (100%)
i) Estimable Probabilities

This study can estimate P(Smoking | Emphysema), but not P(Emphysema | Smoking).

ii) Measure of Association

Odds Ratio = (101/78) / (130/208) = 1.295 / 0.625 = 2.072

Strictly speaking, the odds of having smoked among people with emphysema are about twice those among people without emphysema. Thus, we could estimate that the probability of emphysema is about twice as high among smokers as among non-smokers.

hdLEqZepmYoqp7V+ByJ2vZGomcrOedNuLfpSc6mx

iii) SAS Procedure

The output above was obtained using proc freq.

8. Binomial Distribution Analysis

A random variable, X, can take on values 0, 1, 2, and 3. A sample of 200 observations is taken.

i) Frequency Table
xfrfcfrcf
020.0120.01
1160.08180.09
2800.40980.49
31020.512001.00

A researcher thinks this random variable follows a binomial distribution with parameters n = 3 and p = 0.8.

ii) P(X = 3) using B(3, 0.8)

If we let Y = X – 3, then Y ~ B(3, 0.2). P(X = 3) = P(Y = 0) = 0.512 (from a binomial probability table or calculated as (3C3)(0.8)^3(0.2)^0 = 0.512)

r0BPchJ8V5+aWBLQ4GBMezDXHnoQgcoThPBrmdeI

iii) Estimated P(X = 3) from Sample Data

0.51 (the relative frequency for 3)

iv) P(X ≥ 2) using B(3, 0.8)

P(X ≥ 2) = P(Y ≤ 1) = 0.896 (from a binomial probability table or calculated as P(X=2) + P(X=3))

v) Estimated P(X ≥ 2) from Sample Data

0.40 + 0.51 = 0.91 (the sum of the relative frequencies for 2 and 3) OR 1 – 0.09 = 0.91 (1 – the relative cumulative frequency for 1)

vi) Researcher’s Hypothesis

Yes, it appears the researcher is correct, as the theoretical probabilities and observed relative frequencies are very close.

9. Toxin in Lakes

A state claims that only 10% of its lakes contain a toxin. Suppose 20 lakes in the state are randomly selected.

i) Conditions for Binomial Distribution
  1. The probability of a lake containing the toxin is 0.1 for any lake selected (constant probability).
  2. The presence of the toxin in one lake is independent of the presence of the toxin in any other lake (independent trials).
ii) Expected Number of Lakes with Toxin

E(X) = np = (20)(0.1) = 2

iii) P(X ≤ 1)

P(X ≤ 1) = 0.3917 (from a binomial probability table)

iv) P(X ≥ 7)

P(X ≥ 7) = 1 – P(X ≤ 6) = 1 – 0.9976 = 0.0024 (from a binomial probability table)

v) Validity of the Claim

No. The probability of observing 7 or more lakes with the toxin, assuming the claim is true, is very small (0.0024). This small p-value suggests that the claim of 10% is not true.

10. Stupid Driving Moves

The number of stupid driving moves (SDMs) observed per day, on average, is 2.

i) Appropriate Distribution

The Poisson distribution is appropriate because we are counting the number of SDMs in an interval of time.

ii) P(2 ≤ X ≤ 4)

P(2 ≤ X ≤ 4) = P(X ≤ 4) – P(X ≤ 1) = 0.9473 – 0.4060 = 0.5413 (from a Poisson probability table)

iii) P(X = 0) in Four Days

Method 1: The probability of no SDMs in four days is the probability of no SDMs in one day raised to the fourth power: (0.1353)^4 = 0.0003

Method 2: For four days, the Poisson distribution has λ = 4 * 2 = 8. P(X=0) = 0.0003 (from a Poisson probability table)

11. Commute Time

The commute time from home to the parking lot is normally distributed with a mean of 24 minutes and a standard deviation of 2 minutes.

i) P(X ≤ 22)

P(X ≤ 22) = P(Z ≤ (22-24)/2) = P(Z ≤ -1) = 0.1587 (from a standard normal table)

Xrb5YAf3HAmwMvOchTQQAAOw==

ii) P(X ≥ 40)

P(X ≥ 40) = P(Z ≥ (40-24)/2) = P(Z ≥ 8) ≈ 0

fgcy66b8BRrpz5OqImIu8+QcG9zoKq0isDNwxovw

iii) 95th Percentile

0.95 = P(X ≤ c) = P(Z ≤ (c-24)/2)

From a standard normal table, 0.95 ≈ P(Z ≤ 1.645)

Therefore, (c-24)/2 = 1.645, so c = 24 + (1.645)(2) = 27.29

ObdEgdaqI+ILoKBYV4RxtiEZdjZuBhwCwrRoo7Y8 6G5CwTlBAEAOw== muG4lk6iicsEgoBADs=

12. Multiple Choice

i) Negatively Skewed Population

The mean will be smaller than the median.

ii) Robust Statistics

Use both the median rather than the mean and the IQR rather than the standard deviation.

iii) Selecting and Ordering Paintings

6P4 = 6!/(6-4)! = 6*5*4*3 = 360

iv) Range of a Probability

0 ≤ p ≤ 1

v) Coin Flips

The probability of getting heads on the next three flips, given heads on the previous two flips, is (1/2)^3 = 0.125 (due to independence).

vi) Events A and B

Both P(A & B) ≤ P(A) and P(A or B) ≥ P(A) must be true.

vii) Selecting Women from a Room

The probability of selecting all women is 0, as there are only 2 women and we are selecting 3 people.

viii) Relative Risk of 5

Exposed people are 5 times more likely to get the disease than non-exposed people.

ix) Odds Ratio as an Estimate of Relative Risk

An odds ratio is a good estimate of a relative risk if the prevalence of the disease is small.

x) American League Pennant Winner

I do not have information about sports outcomes, including the American League pennant winner, beyond my last knowledge update in November 2023.