Research Methods: Validity, Reliability, and Experimental Design
*Valid logic: True premises always lead to a true conclusion. Sound logic: The argument is valid and all of the premises are true
Hypotheses:
Directional alternative hypothesis: –
Group 1 will perform better than Group 2 -As X goes up, so does Y -As X goes up, Y goes down
Causal or Associative Hypotheses — –
Some attempt to define a causal relationship -The experimenter must be in control of the cause -“If I change X, then I think that Y will also change” …-Co-Occurrence: -The presumed cause and presumed effect have to either both occur or neither occur Time Sequence: -The presumed cause must precede the presumed effect -Alternative causes must be ruled out
-Some just describe associations and predict outcomes -The experimenter is not in control of either variable–
“These groups respond differently”
-I wonder if x is related to y
-X is the predictor variable (participant variable) –
A pre-existing characteristic used to classify subjects -Y is the outcome variable (dependent) –
Outcome, response, measured
Participant Variables
: -Preexisting characteristics -Individual differences of participants -Not under the experimenter’s control -Participants are grouped according to their level or value
Correlational Research– –
Two measured variables -Searching for an association -ex. Behavioral correlation: Fidgeting in class based on location -Frequency Claim: How many students fidget -Association Claim: Do students who sit in the front of the room fidget less than those who chose to sit in the back of the room? *Cannot assess a Causal claim -We did not start with equal groups -Did not manipulate anything
Experimental Research: –
One manipulated variable, observe one or more measured variable -Searching for causation -Start with two equal groups -Random assignment to groups *Hold all variables constant
Validity: Accuracy or correctness — –
Soundly based on facts or evidence -Genuine, credible, true -Can be very narrow -Applied to particular variable -Can be very broad -Applied to an entire study or program of research -Not directly measurable -Evaluated based on logical and reasoning
Reliability: Consistency —
-Repeat, replicate -If the characteristic being measured is stable, a reliable test yields consistent results – The extent to which the test is consistent in its evaluation of the same individual (or their physiology) over repeated administrations -Can the variables be measured reliably. Measuring Reliability: -Calculate correlations to assess reliability:
-Test-retest -Split-half -Parallel or alternate forms -Item-Total correlation -Coefficient Alpha (Cronbach’s)
Construct -Hypothetical variable, which is not directly observable ex. Intelligence, happiness, introversion/extroversion -Estimate the participant’s level of the construct by asking multiple questions -Operational definition is VERY important
Construct Validity:
-How well are the variables constructed? -How well did the researchers measure the variables? *Is the operational definition accurate? -Are you measuring what you think you’re measuring?
Face Validity: -Does it appear to measure the construct? Procedure: -Was the procedure carried out well? Method: -Is this an appropriate way of measuring the construct? Ex. Intelligence: Measuring size of head -Reliable (everyone has a head) *Not valid
Internal and External Validity:
*Generally experiments have high internal validity but low external validity–
Internal Validity: -Do the conclusion follow from the study? -Logical based on data? -Was the study conducted well? -Were alternate explanations ruled out? -Were all potential confounds eliminated? -External Validity: Generalizability -Does this study reflect what typically happens in the world? *Do the results generalize to other situations -Do the results generalize to all people or across related species?
**Internal up= External down
Statistical Validity: *Type I Error: Seeing a relationship that isn’t there *Type II Error: Missing a relationship that is there (False negative)
**If a measure is valid it must also be reliable -Reliability is necessary for validity–
Reliability not sufficient for validity
Interrogating Frequency Claims:–
Construct validity of the variable -How well was the variable measured -quality of measurements *External validity is essential -Can we generalize from sample to population?
-Nominal: -Naming scales -Categorical variables -Property of identity -Not measured with inherent order -Any number coding is arbitrary ex. Male/Female, Cat/Dog CATEGORICAL
-Ordinal: -Property of identity -Property of magnitude -Levels of variable are arranged in an order -Higher levels represent more of the variable than do lower levels ex. 1st, 2nd, 3rd CATEGORICAL
-Interval: -Property of identity -Property of magnitude -Property of equal intervals *No true zero ex. IQ tests (CONTINUOUS)
-Ratio: -Property of identity -Property of magnitude -Property of Equal intervals -Property of True zero *Allows ratios to be made ex. Scores on an exam, weight (CONTINUOUS)
-Identity: Each number has particular meaning -Magnitude: Numbers have an inherent order -Equal intervals: Difference between units is the same anywhere on the scale -True zero
Surveys
-Open-ended -Gives the information in the participant’s own words -Can be more open to different (wrong) interpretations *Can be very hard to compare across participants
-Closed (Forced-choice) -More common -Forces a choice between limited options *Easier to compare answers between participants ex. Multiple choice…Problems: NO OVERLAP, DONT SKIP GROUPS, ALL CATEGORIES INCLUDED -Forced-alternative questions…Two extreme options, force choice…
Construct Validity
How well was the variable measured? -What (else) influenced the answers given? -Were the questions worded clearly? -Avoid response sets (switch it up, acquiescence, reverse-wording/coding) -Clear layout (easy to answer) -How was the survey distributed? -Were other people around?
-Avoid leading questions, double-barreled questions, social desirability bias, long questions, negation, big words, keep it simple….. *The best way to know that your survey is effective is through pretesting
*
Limitations of self-report data:
-Surveys are appropriate for measuring opinions and attitudes, not behaviors
-Awareness of the information
-Access to the information-forgetting
*General rule-don’t ask if you can access the data some other way
–
Cannot determine causality
-We are unaware of why we do things
Observations:
-Construct validity
*Threats to construct validity
-Observers may see what they expect to see
*
Solution:
Make observations more objective/accurate by:
-Recording video and audio whenever possible
-Using checklist or ethogram
-Time sampling
-Inter and intra-rater
-Blind observer
-Observers can affect what they see
-Self-fulfilling prophecy
*
Solution:
Blind Observers
-Unaware of the hypothesis of the study
-Unaware of which group each participant is in
-Observer effects
-Being observed influences behavior
–
Solution:
-Be unobtrusive: Hide
-Allow time to acclimate
Key Points:
The Design of surveys or questionnaires
-Can access internal thoughts, feelings, beliefs
-Types of questions
-What makes a ‘good’ question
-Threats to construct validity
The design of observational studies
–
Can observe current actions/performance or access archival records
-How we control observational studies
-How we control participants’ reactions to observers
-Types of observations
-Observing actions
-Naturalistic (natural setting)
-Systematic-with a specific research purpose
-Observing performance
-Performance on tests or tasks
-Observing archives
-Information that was documented for other purposes
-Content analysis-analyzing archives
Lecture 4-2:
Sampling and Generalizing Large Populations (2/4/14)
Topics:
–
Introduction and rationale
-Generalizability (external validity)
–
Two sources of error
-Chance and bias
-Probability samples (representative)
-Simple random, cluster/multistage, and stratified
–
Nonprobability samples
-Quota, purpose, and convenience
External Validity:
-Generalizability
-Do the results apply to:
-The entire population
-Other demographics
-Other settings
-Other situations
*Research conducted with a representative sample has much higher external validity
Sampling:
-You are interested in a particular population
ànot able to talk to everyone (census)
-Take sample from population
*Findings only relate to entire population if sample is representative
ex. Literary Digest (1936)
-Ran polls before presidential elections
– Sent out 10 million à 2.4 million returned
-Sample Digest drawn from
-Magazine subscribers
-Telephone directories
-Car registration
-Literary Digest results: Landon winning (57%)
*Actual results: Landon losing (38%)
*Sample was biased towards wealthier/right-leaning population
Error in Sampling:
*Error is the extent to which the sample differs from the population
Sampling error:
-No two samples will be the same; there will be variation
*Random
ex. Mixed colors population
*Lessen sample error by increasing number of sampled
*
Dealing with sample error:
-Increase sample size
-The larger your sample size, the closer you will be to your population
-Margin of error decreases the larger the sample population (1,000-2,000)
-Estimate using statistics
-Standard error- how much random error in measurement
Sampling Bias:
-Sample is selected systematically
*Non-random
*Constant error
-evil, wicked, wrong
-Caused by a non-random selection procedure
-Selected based on ease or any other systematic difference
-Self-selection, convenience samples
-Results in a sample that is not representative of the population
*Reduces external validity
-Not able to draw conclusions about the population
Probability Samples (Representative):
-Simple Random Sampling
-Cluster/multistage
-Stratified sampling
–
Simple Random Sampling:
1. Identify all members of the population
2. Every member has an equal likelihood of being selected
-Sample will be representative of the population
*You will be able to draw conclusions about the entire population
*Very high external validity
Cluster/Multistage Sampling:
-Advantage: Can accomplish representativeness with a smaller random sample
-WILL be able to draw conclusions about the rest of the population
Cluster:
-Randomly select subpopulations and include all the individuals within each in the study
-Difficult to accomplish
ex. Interview all voters in certain districts
Multistage:
-Randomly select subpopulations and include a random sample of individuals in the subpopulation in the study
Stratified Sampling
-Divide population into subpopulations
ex. Demographic groups, education level
-Randomly select proportional samples from each subpopulation
–
Advantage
-Can accomplish representativeness with a smaller random sample
-Able to draw conclusions about the rest of the population
*Ratios remain constant between whole population and sample population
ex. Student Fee Increases
-Attitudes regarding a three year phase-in of a fee increase
-Population: current University students
-Enrollment percentages based on yearà sample size of 50
-17 freshman, 13 sophomores, 11 juniors, 9 seniors
-Maintain ratios
Practical Considerations:
-identifying all the members
-Selection doesn’t ensure participation
-Self-selection will introduce bias
-Try to gather some information about the non-respondents so that changes can be made in future recruitments
-Incentives vs. Motivations
Non-Probability Sampling:
-Quota
-Purposive
-Snowball
-Haphazard or convenience
-Not representative but more realistic
**Low external validity
–
Cannot draw conclusions about the entire population
Quota:
1. Researchers set proportions in order to compare groups that may differ greatly in frequency
ex. CEO Percentages between men and women
-Set minimum number required to distinguish between groups
2. Researcher sets proportions to improve representativeness in a nonrandom sample
Purposive Sampling:
-Select specific subsets to compare
-Not attempting to make generalizations
-What makes a successful PSC41 student?
-Not trying to generalize results to all UCD students or all college-age students
-Select a (non-representative) sample that will exaggerate the effects
ex. Comparing the characteristics of the top 10% VS. Bottom 10%
Snowball Sample:
-Interested in a hard-to access population
ex.People who hoard, coping with crohn’s disease
*Ask each member you find to recommend additional members
Haphazard or Convenience Sample
-Using what is convenient and available
-Most research (Large universities)
-Must fulfill participation requirements
Improving Representativeness of Non-Probabilistic Samples
-Use systematic selection procedures to minimize selection bias
*Quota, purposive and snowball will be better than convenience
*Go get your data, don’t let it come to you
-Sample from broad variety of times/places
-Even if your sample is not random you can (lessen) that bias if time and place are varied
Clickers:
-Interviewing every third person who enters the main door:
Convenience
-Recruiting undergraduates who are enrolled in lower-division psychology classes likely produces a Convenience sample of the human race
-Captive subject pool
Module 5-1: Lecture 1-Correlational Research Tools (1/6/14)
Topics:
Bivariate correlational research
-Predictor and outcome variables are measured
Correlation Coefficient r
-Direction of relationship
-Strength of relationship
Linking hypotheses to r values
-What does the hypothesis tell us about the expected r value?
-Determining significance
Bivariate Correlational Research
-Describe a relationship between two measured variables
-Neither variable is manipulated
-Variables can be continuous or categorical
-X axis usually predictor
-Y axis usually Outcome
*Correlation is not causation (need to preform experiment)
ex. Stroop Test vs. Food Eaten
-Prefrontal cortex uses high levels of glucose
-How does blood glucose level affect stroop performance scores
Continuous:
-Predictor: Calories consumed at lunchà x axis
-Outcome: time to accomplish stroop à y axis
*Scatterplot
-Results: Negative relationship (More calories consumedàless time needed to accomplish test)
Categorical:
-Predictor: Did you eat lunch today? à x axis
-Outcome: Time to accomplish Stroop
*Bar graph
-Results: Longer time in “No” column
Continuous Variables:
-If predictor and outcome variables are measured continuously
-Can plot the relative position of each participant on these two continuous scales
*Can calculate a correlation coefficient
-A number r that quantifies the nature and strength of that relationship
Categorical Variables:
-When one of the variables is categorical, it is better to display the data as a bar graph or histogram
*The values displayed in a bar graph are mean or average of all individual data points
The Correlation Coefficient:
-Positive linear association
-Increase in one variable=Increase in another
-Negative linear association
-Increase in one variable= decrease in another
-Curvilinear Association
-Increases in one variable relate to both increases and decreases in another
-No association
Correlation Coefficient (r)
–
Nature of the relationship
-Positive, negative, no relationship
–
Strength of relationship
-Correlation number between -1 and +1
-Determined by absolute value of r
*Graphically: Higher r = more densely clustered data
-Higher number: Greater ability to predict one variable based on another
0 = no correlation
.10 (-.10) = Small or weak correlation
.30 (-.30) = Medium or moderate correlation
.50 (-.50) = Large or strong correlation
Outliers:
Extreme Score
-One or a few that stand away from the pack
-Can have a strong effect on the r value
-Matter most when the sample size is small
Linking Hypotheses to r values:
-What does the hypothesis tell us about the expected r value?
-Determining significance
Non-directional (two-tailed) Hypotheses:
There is a relationship between X and Y
-Null Hypothesis:
Ho: r = 0
HA: r not = 0
Directional (one-tailed hypotheses)
Positive relationship:
-Higher levels of X are associated with higher levels of Y
Ho: r = 0
HA: r > 0
Negative Relationship:
-Higher levels of X are associated with lower levels of Y
Ho: r = 0
HA: r<>
**
Is it Significant?
-Statistical significance tells you the probability that the result is due to chance
-Decision rule for rejecting or failing to reject Ho
*We are willing to accept 5% chance that you think you see a real relationship when none really exists à p<>
-Compare calculated r to table of critical values
-Degrees of freedom based on number of participants (n-2)
-If observed r larger than critical value, conclude significance
*
Effect Size:
-How much of the variance does this explain?
-r^2 = coefficient of determination
-Proportion of variance shared by the two variables
ex. Child has behavior problems: r=.46, r^2 = .21
-21% of variation in spanking was predicted by the child’s behavior problems
Interpreting a correlation
-Correlation does not imply causation
*If two variables are correlated we can conclude:
-X is related to Y
-Y is related to X
-r^2 = % of variance in X is predicted by Y
-It is impossible to tell if X caused Y,
Y caused X or if they are both related to a third variable (spurious correlation)
Module 5-Lecture 2:
Interrogating Correlational Studies
Topics:
Important validities to consider when assessing correlational studies:
-Construct
-Statistical
-External
-Internal
*Construct Validity:
Measures what its supposed to measure
-Face Validity:
-How well do the operational definitions measure each variable/construct
-Procedural Validity:
-Was the procedure carried out well?
-What was their population and sampling method?
-Was the number of subjects/observations high or low?
-Were all subjects approached/treated equally?
-Method Validity:
-Is this an appropriate way of measuring the construct?
-Is the study using self-report when observations or physiological measurements would be better?
*Could any of measurements be biased? Are they reliable (test-retest, intra and inter-observer?)
*
Statistical Validity:
Were the differences in groups analyzed well?
-Drawing valid conclusions based on statistical analysis
-Are researchers using appropriate statistical tests?
-What is the r value? What is the effect size? P value?
*p<.05àless than=”” 5%=”” chance=”” of=”” error=”” if=”” we=”” reject=”” the=”” null=”” =”” =””>àless>
-Are they drawing conclusions based on significant/non-significant results?
-Are there any outliers
-Subgroups in population?
-No relationship or curvilinear relationship?
*External and Internal Validity:
Usually as one goes up-other goes down
External:
Results generalize to real-world setting
-Do the results generalize to other people/species, times and places?
-Depends on how the population was defined and how subjects were sampled
Internal:
Changes in DV were caused by IV
-Do the conclusions follow from the study?
-Even if A and B correlate and A comes before B, we cannot conclude that A caused B
**Can’t eliminate other factors (third variables) in bivariate correlational studies because variables are measured not manipulated
*If there is a plausible third variable: cannot infer causation
Module 6-1: Experimental Methods
-*Essential qualities of an experiment
*Cause and effect criteria
-Co-Occurrence:
-The presumed cause and the presumed effect have to either both occur or neither occur
-Time Sequence:
-The presumed cause must precede the presumed effect
-Alternative causes must be ruled out
*Definition of an experiment
-Start with equal groups
-Systematically manipulate the level of independent variable
-Hold everything else constant
-Construct, statistical, internal, and external validity
-Random assignment
*True Experiments:
-At least two groups compared
-There is a control and comparison condition
-Groups should be equal in every way possible
1. Large Number of Participants:
Random assignment (Not the same as random selection)
-Every participant has equal chance of being placed in each of the treatment groups
2. Small Number of Participants:
**
Matching procedure
-Measure the potentially confounding difference
-Pair similar participants together on that trait/variable
-One member of each pair is randomly assigned to each group
ex. Have all equal levels of IQ in each tested group
-Aim to eliminate individual differences
-Independent variable is systematically manipulated
-Dependent variable is measured and the group averages are compared statistically
-Types of experiments
Between Subject Group Design (Independent Groups)
-Different groups receive the different levels of independent variable
-Compare difference between groups
-Types:
-Pre-test/Post-test
-Post-test only
-Must use:
-Participating in one condition makes it impossible to participate in another: Lasting effects of treatment, knowledge of research
-Participant variables are preexisting characteristics of participants
ex. Gender, mental health diagnosis
Within Subject Group Design (Repeated Measures)
-Each participant receives all levels of independent variable
-Each participant acts as their own control
-Types:
-Concurrent Measurement
ex. Taste Coke and Pepsi: choose favorite
-Repeated measures
Advantages:
-Minimize number of subjects
-Minimizes the effect of individual differences
Disadvantages:
-Order effects-Previous experience or condition could effect subsequent performance
-Solution: Counterbalance
-Ruling out “everything else”
*Extraneous Variables:
-Not related to treatment but could influence the outcome
*Uncontrolled factors which are not of interest to the researcher
-Might just add unsystematic variability (noise)
*Might provide alternate explanation of results
**
Confounding Variable:
-Systematically co-varies with independent variable
-Changes at the same time as independent variable
-Not just noise
-Has effect that cannot be separated from the effects of independent variable
*Provides alternate explanation for the observed change in dependent variable
*Experimenter Effects
-Demand characteristics
-Solutions:
-Audio/video recording of instructions
*Double-blind procedure: Participant and experimenter are not aware of what group they are in
*Participant Effects:
-Aim to reduce demand characteristics
-Use deception so that participants are unaware of the research hypothesis
-Placebo control group
-To control for participant’s expectations
-Conduct manipulation check
-Ask each participant what he/she thought the hypothesis was at the end
**Threats to internal validity
Maturation effect: Changes due to time
-Growing, fatigue, hunger
Testing effect: Changes due to pre-test
-practice, expectations
History effect: Changes due to events in the outside world
Mortality effect: Differences due to some participants dropping out
Selection effect: Differences due to unequal groups to begin with
Between Subjects Designs:
-Ensure that the groups are treated as similarly as possible
-Maturation, testing, mortality
-Gather data rapidly/simultaneously
-History effect
-Random assignment
-Selection effect
Within Subject Designs:
-Counterbalance:
-Maturation effect
-Testing effect
-Gather data rapidly/simultaneously
-History effect
-Random assignment to counterbalance order
-Selection effect
-If they drop out, they drop out of both conditions
-Mortality effect
-What if independent variable does not make a difference?
Not enough between-group difference
-Ineffective manipulation
-Insensitive measures
-Ceiling or floor effects (DV or IV)
Too much variability within group members
-Measurement error
-Individual differences
-Situation variability/noise
*True and quasi-experiments
Quasi-Experiment:
-Researchers cannot randomly assign the participants to the conditions
-“looks like” an experiment but lacks equal groups
-Participants are selected for the groups depending on the level of predictor variable
*There are often quasi-experimental results included in true experiments
True Experiments:
-Start with equal groups
-Vary one factor
-Measure and compare outcome variable
-Differences in DV must be caused by IV
-“Different levels of IV caused different performance on DV”
Quazi Experiments:
-Divide groups based on one factor
-Groups not equal
-Measure and compare outcome variable
-Cannot conclude causality
-“The groups performed differently on the outcome measure”
Module 6-2: Factorial Designs
Factorial Designs:
*Experiments that have more than one independent variable
-Allow one to investigate effects of more than one independent variable on outcomes
-Behavior is multi-caused
-Considering multiple IV’s makes our research more similar to real life
-Sometimes effect is obscured when only looking at one variable
-Sometimes effect is contingent upon the presence of both variables
Examining for:
-Main effects of each independent variable
-Interactions between independent variables
–
Factor:
-Cause
-Independent variable
-Source of systematic (non-random) variance
*ex. Diagnosis (mild depression, severe depression)x Treatment group (Group 1, Group 2, Group 3) x treatment duration (short, moderate, long)
-2x3x3
-3 main effects
-3 (two way interactions)+1(threeway interaction)