Introduction to Statistics
1. Data Types
The number of pages in a daily newspaper: Discrete Numerical
The colours of medals won at an Olympic Games: Ordinal Categorical
The maximum daily temperature in a city: Continuous Numerical
The brands of televisions in a store: Nominal Categorical
2. Survey Bias
In order to determine which brand of car is the most popular in Australia, a telephone survey conducted on behalf of a motor car company rings 100 households in Adelaide between the hours of 2 and 5 o’clock in the afternoon to ask what brand of car they drive.
a) Is this survey biased? Why/why not?
Yes, it is biased. The survey sample only includes people from Adelaide and not other cities. Only people who do not work during 2-5 in the afternoon would have been asked.
b) Describe a better survey technique.
The survey needs to include at least 10% of the total population and people from all over the country, every state, both city and country. The survey hours need to be longer, perhaps between 6 am and 9 pm.
3. Box & Whisker Plot
The box & whisker plot below shows how much time was spent per night on homework for a class at a certain high school during September.
Average Minutes Per Night Spent On Homework
0 20 48 60 190
a) What is the maximum amount of time spent on homework?
190 minutes
b) What percentage of the class spends more than 60 minutes on homework per night?
25%
c) What is the range of times that the middle 50% of the class spend on homework per night?
20-60 minutes
d) What percentage of the class spends less than 20 minutes per night on homework?
25%
4. Scatterplot and Economical Speed
Refer to the table on the right:
a) Indicate which variable is dependent and which variable is independent.
L/100km is the dependent variable, and km/h is the independent variable.
b) Draw a scatterplot.
c) Can you give advice about the most economical speed?
It is more economical to drive slower as less fuel is used. As the speed increases, so does the amount of fuel consumed.
5. Parallel Box Plots
The data below shows the times, in minutes, spent at a gym by a selection of males and females.
Males | Females | |||||||||||
22 | 35 | 31 | 40 | 36 | 28 | 41 | 37 | 20 | 41 | 48 | 56 | |
31 | 24 | 15 | 27 | 39 | 45 | 32 | 46 | 57 | 40 | 52 | 45 | |
52 | 36 | 43 | 21 | 16 | 37 | 27 | 39 | 55 | 47 | 66 | 50 | |
44 | 15 | 55 | 34 | 42 | 31 | 35 | 48 |
Draw parallel box plots to represent the data above.
6. Standard Deviation and Consistency
The points scored by Andrew and Brad in a selection of basketball matches are given in the table below:
Points by Andrew | 23 | 17 | 31 | 25 | 25 | 19 | 28 | 32 |
Points by Brad | 9 | 29 | 41 | 26 | 14 | 44 | 38 | 43 |
a) For each set of data:
i. Determine if there are any outliers
ii. Calculate the standard deviation of the number of points scored
Andrew
Outliers:
Q1: (19+23)/2 = 21
Q3: (28+31)/2 = 29.5
IQR = Q3 – Q1 = 29.5 – 21 = 8.5
Lower Boundary = Q1 – (1.5 x IQR) = 21 – (1.5 x 8.5) = 8.25
Upper Boundary = Q3 + (1.5 x IQR) = 29.5 + (1.5 x 8.5) = 42.25
There are no outliers as all values are between the Lower and Upper values (i.e., all between 8.25 and 42.25).
Standard Deviation:
Mean = (23 + 17 + 31 + 25 + 25 + 19 + 28 + 32)/8 = 25
Standard Deviation = √(198/8) = 5.31
Brad
Outliers:
Q1: (14+26)/2 = 20
Q3: (41+43)/2 = 42
IQR = Q3 – Q1 = 42 – 20 = 22
Lower Boundary = Q1 – (1.5 x IQR) = 20 – (1.5 x 22) = -13
Upper Boundary = Q3 + (1.5 x IQR) = 42 + (1.5 x 22) = 75
There are no outliers as all values are between the Lower and Upper values (i.e., all between -13 and 75).
Standard Deviation:
Mean = (9 + 29 + 41 + 26 + 14 + 44 + 38 + 43)/8 = 30.5
Standard Deviation = √(1252/8) = 13.42
b) Which of the players is more consistent?
Andrew is the more consistent player as his standard deviation is smaller.