Understanding Statistical Studies: Definitions and Methods
Individuals and Variables
Individuals are objects described by a set of data (people, objects, or animals).
Variable – any characteristic of an individual. Can take different values for different individuals.
Observational Study – observes individuals and measures variables of interest but does not intervene to influence the responses (describes some group or situation).
Sample Survey – studying some of its members, selected not because they are of special interest but because they represent the larger group.
Population – the entire group of individuals.
Response – measures an outcome or result of the study.
Experiment – treatment on individuals in order to observe their responses (to study whether the treatment causes a change in response).
Sample – part of the population from which we actually collect information and is used to draw conclusions about the whole.
Census – sample survey that attempts to include the entire population in the sample.
Block – a group of experimental subjects that are known before the experiment to be similar in some ways that are expected to affect the response to treatments.
Block Design – the random assignment of subjects to treatments is carried out separately within each block.
Matched Pairs Design – combines matching with randomization. Compares just two treatments (one gets one and the other gets the second treatment, randomly assigned).
Dropouts – studies that begin but do not complete.
Non-adherers – study participants that don’t follow the experimental treatment.
Completely Randomized Design – an experimental design in which experimental subjects are allocated at random among all the treatments.
Statistically Significant – an observed effect of a size that would rarely occur by chance.
Randomized Comparative Experiment – one that compares just two treatments.
Double-Blind Experiment – neither subjects nor physicians recording the symptoms know which treatment was received.
Treatment – any specific experimental condition applied to the subjects.
Clinical Trials – experiments that study the effectiveness of medical treatments on actual patients.
Subjects – individuals studied in an experiment are often called this.
Probability Sample – is a sample chosen by chance.
Response Variable – measures an outcome or result of a study.
Explanatory Variable – variable that we think explains or causes changes in the response variable.
Statistical Study is Biased If – it systematically favors certain outcomes.
Convenience Sample – selection of whichever individuals are easiest to reach (often biased).
Voluntary Response Sample – chooses itself by responding to a general appeal (often biased).
Simple Random Sample (SRS) – of size n consists of individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.
Table of Random Digits – is a long string of digits 0-9 with these two properties: a) each entry in the table is equally likely to be any of the 10 digits 0-9, b) the entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part.
How to Choose an SRS: a) Label – assign numerical labels to every individual in the population, be sure that all labels have the same number of digits if you plan to use a table of random digits, b) software or table – use random digits to select labels at random.
Bias – consistent, repeated deviation of the sample statistic.
Parameter – a number that describes the population, is a fixed number, but in practice we don’t know the actual value of this number.
Statistic – a number that describes a sample, we often use a statistic to estimate an unknown parameter.
To Reduce Bias – use random sampling. When we start with a list of the entire population, simple random sampling produces unbiased estimates: the values of a statistic computed from an SRS neither consistently under nor overestimate the value of the population parameter.
Variability – describes how spread out the values of the sample statistics are when we take many samples. Large variability means that the result of sampling is not repeatable (a good sampling method has both small bias and small variability).
To Reduce Variability – of an SRS, use a larger sample, you can make the variability as small as you want by taking a large enough sample.
95% Confidence – that the truth lies within the margin of error.
Confidence Statement – to say how accurate our conclusions about the population are.
Margin of Error – truth about the population would be within the margin error 95% of the time.
Sampling Error – errors caused by the act of taking a sample, they cause sample results to be different from the results of a census.
Can Estimate the Margin of Error – for 95% confidence based on a simple random sample of size n by the formula 1/√n. This formula suggests only the sample, not the size of the population.
Choose a Stratified Random Sample: a) divide the sample frame into distinct groups of individuals, called strata. Choose the strata according to any special interest you have in each stratum resembling each other. b) take a separate SRS in each stratum and combine these to make up the complete sample (stratum – a layer).
Clusters – groups arranged in order of their location and divided into groups.
Strata – distinct groups of individuals.
Weight the Responses – attempt to correct sources of bias.
Deal with Non-Sampling Errors, Especially Nonresponse You Should – substitute other households in the same neighborhood to reduce bias.
The U.S. Takes a Census of the Population Every 10 Years.
Use Counts or Percentages to – display statistics.
Statistics – science of data.
Nonresponse – is the failure to obtain data from an individual selected for a sample. Most happen because subjects who are contacted refuse to cooperate.
Response Error – when a subject gives the incorrect response.
Undercoverage – occurs when some groups in the population are left out of the process of choosing the sample.
Sampling Frame – list of individuals from which we draw our sample.
Processing Errors – mistakes in mechanical tasks such as doing arithmetic or entering responses into a computer (nonsampling error).
Lurking Variable – has an important effect on the relationship among the variables in a study but is not one of the explanatory variables studied (genetic inheritance, gender).
Placebo Effect – response of patients to a placebo.
3 Things to Declare an Experiment Ethical – institutional review board, informed consent, and confidentiality.
Improve Reliability – measure multiple times and use the average of the measurements as the final result.
Errors in Measurement – measured value = true value + bias + random error.
Rate – fraction, proportion, or percent.
Advantages of Using a Block Design – can reduce confounding variables, include a potential lurking variable in the design and its effects can now be accounted for.
Refusals – subjects that we want in a study but refuse to participate.
Nonadherers – participate but do not follow the experimental treatment.
Single Blind Experiment – subjects are unaware of the exact treatment being imposed on them (controls bias).
Reduce SRS – use a larger sample.