Introduction to Statistics

1. Data Types

The number of pages in a daily newspaper: Discrete Numerical

The colours of medals won at an Olympic Games: Ordinal Categorical

The maximum daily temperature in a city: Continuous Numerical

The brands of televisions in a store: Nominal Categorical

2. Survey Bias

In order to determine which brand of car is the most popular in Australia, a telephone survey conducted on behalf of a motor car company rings 100 households in Adelaide between the hours of 2 and 5 o’clock in the afternoon to ask what brand of car they drive.

a) Is this survey biased? Why/why not?

Yes, it is biased. The survey sample only includes people from Adelaide and not other cities. Only people who do not work during 2-5 in the afternoon would have been asked.

b) Describe a better survey technique.

The survey needs to include at least 10% of the total population and people from all over the country, every state, both city and country. The survey hours need to be longer, perhaps between 6 am and 9 pm.

3. Box & Whisker Plot

The box & whisker plot below shows how much time was spent per night on homework for a class at a certain high school during September.

Average Minutes Per Night Spent On Homework

pastedGraphic.png

0 20 48 60 190

a) What is the maximum amount of time spent on homework?

190 minutes

b) What percentage of the class spends more than 60 minutes on homework per night?

25%

c) What is the range of times that the middle 50% of the class spend on homework per night?

20-60 minutes

d) What percentage of the class spends less than 20 minutes per night on homework?

25%

4. Scatterplot and Economical Speed

Refer to the table on the right:

mhC4AAAAAElFTkSuQmCC

a) Indicate which variable is dependent and which variable is independent.

L/100km is the dependent variable, and km/h is the independent variable.

b) Draw a scatterplot.

pastedGraphic_1.png

c) Can you give advice about the most economical speed?

It is more economical to drive slower as less fuel is used. As the speed increases, so does the amount of fuel consumed.

5. Parallel Box Plots

The data below shows the times, in minutes, spent at a gym by a selection of males and females.

Males

Females

22

35

31

40

36

28

41

37

20

41

48

56

31

24

15

27

39

45

32

46

57

40

52

45

52

36

43

21

16

37

27

39

55

47

66

50

44

15

55

34

42

31

35

48

Draw parallel box plots to represent the data above.

pastedGraphic_2.png

6. Standard Deviation and Consistency

The points scored by Andrew and Brad in a selection of basketball matches are given in the table below:

Points by Andrew

23

17

31

25

25

19

28

32

Points by Brad

9

29

41

26

14

44

38

43

a) For each set of data:

i. Determine if there are any outliers

ii. Calculate the standard deviation of the number of points scored

Andrew

Outliers:

Q1: (19+23)/2 = 21

Q3: (28+31)/2 = 29.5

IQR = Q3 – Q1 = 29.5 – 21 = 8.5

Lower Boundary = Q1 – (1.5 x IQR) = 21 – (1.5 x 8.5) = 8.25

Upper Boundary = Q3 + (1.5 x IQR) = 29.5 + (1.5 x 8.5) = 42.25

There are no outliers as all values are between the Lower and Upper values (i.e., all between 8.25 and 42.25).

Standard Deviation:

Mean = (23 + 17 + 31 + 25 + 25 + 19 + 28 + 32)/8 = 25

Standard Deviation = √(198/8) = 5.31

Brad

Outliers:

Q1: (14+26)/2 = 20

Q3: (41+43)/2 = 42

IQR = Q3 – Q1 = 42 – 20 = 22

Lower Boundary = Q1 – (1.5 x IQR) = 20 – (1.5 x 22) = -13

Upper Boundary = Q3 + (1.5 x IQR) = 42 + (1.5 x 22) = 75

There are no outliers as all values are between the Lower and Upper values (i.e., all between -13 and 75).

Standard Deviation:

Mean = (9 + 29 + 41 + 26 + 14 + 44 + 38 + 43)/8 = 30.5

Standard Deviation = √(1252/8) = 13.42

b) Which of the players is more consistent?

Andrew is the more consistent player as his standard deviation is smaller.