Research Methods: Validity, Reliability, and Experimental Design

*Valid logic: True premises always lead to a true conclusion. Sound logic: The argument is valid and all of the premises are true



Directional alternative hypothesis: –

Group 1 will perform better than Group 2 -As X goes up, so does Y -As X goes up, Y goes down

Causal or Associative Hypotheses — –

Some attempt to define a causal relationship -The experimenter must be in control of the cause -“If I change X, then I think that Y will also change” …-Co-Occurrence: -The presumed cause and presumed effect have to either both occur or neither occur Time Sequence: -The presumed cause must precede the presumed effect -Alternative causes must be ruled out

-Some just describe associations and predict outcomes -The experimenter is not in control of either variable
“These groups respond differently”

-I wonder if x is related to y

-X is the predictor variable (participant variable)
A pre-existing characteristic used to classify subjects  -Y is the outcome variable (dependent)
Outcome, response, measured

Participant Variables

: -Preexisting characteristics -Individual differences of participants -Not under the experimenter’s control -Participants are grouped according to their level or value 

Correlational Research– –

Two measured variables -Searching for an association -ex. Behavioral correlation: Fidgeting in class based on location -Frequency Claim: How many students fidget -Association Claim: Do students who sit in the front of the room fidget less than those who chose to sit in the back of the room? *Cannot assess a Causal claim -We did not start with equal groups -Did not manipulate anything     

Experimental Research: –

One manipulated variable, observe one or more measured variable -Searching for causation -Start with two equal groups -Random assignment to groups *Hold all variables constant

Validity: Accuracy or correctness — –

Soundly based on facts or evidence -Genuine, credible, true -Can be very narrow -Applied to particular variable -Can be very broad -Applied to an entire study or program of research -Not directly measurable -Evaluated based on logical and reasoning

Reliability: Consistency —

-Repeat, replicate -If the characteristic being measured is stable, a reliable test yields consistent results – The extent to which the test is consistent in its evaluation of the same individual (or their physiology) over repeated administrations -Can the variables be measured reliably. Measuring Reliability: -Calculate correlations to assess reliability:
-Test-retest -Split-half -Parallel or alternate forms -Item-Total correlation -Coefficient Alpha (Cronbach’s)

Construct -Hypothetical variable, which is not directly observable ex. Intelligence, happiness, introversion/extroversion -Estimate the participant’s level of the construct by asking multiple questions -Operational definition is VERY important

Construct Validity:

-How well are the variables constructed? -How well did the researchers measure the variables? *Is the operational definition accurate? -Are you measuring what you think you’re measuring?

Face Validity: -Does it appear to measure the construct? Procedure: -Was the procedure carried out well? Method: -Is this an appropriate way of measuring the construct? Ex. Intelligence: Measuring size of head -Reliable (everyone has a head) *Not valid       

Internal and External Validity:

*Generally experiments have high internal validity but low external validity
Internal Validity: -Do the conclusion follow from the study? -Logical based on data? -Was the study conducted well? -Were alternate explanations ruled out? -Were all potential confounds eliminated? -External Validity: Generalizability -Does this study reflect what typically happens in the world? *Do the results generalize to other situations -Do the results generalize to all people or across related species?

**Internal up= External down

Statistical Validity: *Type I Error: Seeing a relationship that isn’t there *Type II Error: Missing a relationship that is there (False negative)

**If a measure is valid it must also be reliable -Reliability is necessary for validity
Reliability not sufficient for validity                                    

 Interrogating Frequency Claims:
Construct validity of the variable -How well was the variable measured -quality of measurements *External validity is essential -Can we generalize from sample to population?

-Nominal: -Naming scales -Categorical variables -Property of identity -Not measured with inherent order -Any number coding is arbitrary ex. Male/Female, Cat/Dog CATEGORICAL

-Ordinal: -Property of identity -Property of magnitude -Levels of variable are arranged in an order -Higher levels represent more of the variable than do lower levels ex. 1st, 2nd, 3rd CATEGORICAL

-Interval: -Property of identity -Property of magnitude -Property of equal intervals *No true zero ex. IQ tests (CONTINUOUS)

-Ratio: -Property of identity -Property of magnitude -Property of Equal intervals -Property of True zero *Allows ratios to be made ex. Scores on an exam, weight  (CONTINUOUS)

-Identity: Each number has particular meaning -Magnitude: Numbers have an inherent order -Equal intervals: Difference between units is the same anywhere on the scale -True zero


                        -Open-ended -Gives the information in the participant’s own words -Can be more open to different (wrong) interpretations *Can be very hard to compare across participants

                        -Closed (Forced-choice) -More common -Forces a choice between limited options *Easier to compare answers between participants ex. Multiple choice…Problems: NO OVERLAP, DONT SKIP GROUPS, ALL CATEGORIES INCLUDED -Forced-alternative questions…Two extreme options, force choice…

Construct Validity

How well was the variable measured? -What (else) influenced the answers given? -Were the questions worded clearly? -Avoid response sets (switch it up, acquiescence, reverse-wording/coding) -Clear layout (easy to answer) -How was the survey distributed?   -Were other people around?                              

                        -Avoid leading questions, double-barreled questions, social desirability bias, long questions, negation, big words, keep it simple….. *The best way to know that your survey is effective is through pretesting


Limitations of self-report data:

                                    -Surveys are appropriate for measuring opinions and attitudes,                                          not behaviors

                                                -Awareness of the information

                                                -Access to the information-forgetting

                                                *General rule-don’t ask if you can access the data some                                                       other way


Cannot determine causality

                                                -We are unaware of why we do things


            -Construct validity

                        *Threats to construct validity

                                    -Observers may see what they expect to see



Make observations more objective/accurate by:

                                                            -Recording video and audio whenever possible

                                                            -Using checklist or ethogram

                                                            -Time sampling

                                                            -Inter and intra-rater

                                                            -Blind observer

                                    -Observers can affect what they see

                                                -Self-fulfilling prophecy



Blind Observers

                                                            -Unaware of the hypothesis of the study

                                                            -Unaware of which group each participant is in

                                    -Observer effects

                                                -Being observed influences behavior



                                                            -Be unobtrusive: Hide

                                                            -Allow time to acclimate

Key Points:

The Design of surveys or questionnaires

-Can access internal thoughts, feelings, beliefs

                        -Types of questions

                        -What makes a ‘good’ question

                        -Threats to construct validity

The design of observational studies


Can observe current actions/performance or access archival records

                        -How we control observational studies

                        -How we control participants’ reactions to observers

-Types of observations

            -Observing actions

                        -Naturalistic (natural setting)

                        -Systematic-with a specific research purpose

            -Observing performance

                        -Performance on tests or tasks

            -Observing archives

                        -Information that was documented for other purposes

                        -Content analysis-analyzing archives 

Lecture 4-2:

Sampling and Generalizing Large Populations (2/4/14)



Introduction and rationale

                                    -Generalizability (external validity)


Two sources of error

                                    -Chance and bias


-Probability samples (representative)

                                    -Simple random, cluster/multistage, and stratified


Nonprobability samples

                                    -Quota, purpose, and convenience

External Validity:


                        -Do the results apply to:

                                    -The entire population

                                    -Other demographics

                                    -Other settings

                                    -Other situations

                        *Research conducted with a representative sample has much higher                                    external validity


                        -You are interested in a particular population

                                    ànot able to talk to everyone (census)

                        -Take sample from population

                        *Findings only relate to entire population if sample is representative

                        ex. Literary Digest (1936)

                                    -Ran polls before presidential elections

                                    – Sent out 10 million à  2.4 million returned

                                    -Sample Digest drawn from

                                                -Magazine subscribers

                                                -Telephone directories

                                                -Car registration

                                    -Literary Digest results: Landon winning (57%)

                                    *Actual results: Landon losing (38%)

                                    *Sample was biased towards wealthier/right-leaning population

Error in Sampling:

                                   *Error is the extent to which the sample differs from the population

Sampling error:

                                                -No two samples will be the same; there will be variation


                                                ex. Mixed colors population 

                                                       *Lessen sample error by increasing number of sampled


Dealing with sample error:

                                                            -Increase sample size

                                                                        -The larger your sample size, the closer you                                                                           will be to your population

                                                                        -Margin of error decreases the larger the                                                                                sample population (1,000-2,000)

                                                            -Estimate using statistics

                                                                          -Standard error- how much random error in                                                                            measurement

Sampling Bias:

                                                -Sample is selected systematically


                                                *Constant error

                                                            -evil, wicked, wrong

                                                            -Caused by a non-random selection procedure

                                                                        -Selected based on ease or any other                                                                                      systematic difference

                                                                        -Self-selection, convenience samples

                                                            -Results in a sample that is not representative of the                                                              population

                                                            *Reduces external validity

                                                                        -Not able to draw conclusions about the                                                                                population

Probability Samples (Representative):

                        -Simple Random Sampling


                        -Stratified sampling


Simple Random Sampling:

1. Identify all members of the population

2. Every member has an equal likelihood of being selected

                                                -Sample will be representative of the population

                                                *You will be able to draw conclusions about the entire                                                         population

                                    *Very high external validity

Cluster/Multistage Sampling:

-Advantage: Can accomplish representativeness with a smaller                                             random sample

                                    -WILL be able to draw conclusions about the rest of the population


                                    -Randomly select subpopulations and include all the individuals                                           within each in the study

                                    -Difficult to accomplish

                                                ex. Interview all voters in certain districts


                                    -Randomly select subpopulations and include a random sample of                                      individuals in the subpopulation in the study

Stratified Sampling

                                    -Divide population into subpopulations

                                                ex. Demographic groups, education level

                                    -Randomly select proportional samples from each subpopulation



                                                -Can accomplish representativeness with a smaller random                                                   sample

                                    -Able to draw conclusions about the rest of the population

                                    *Ratios remain constant between whole population and sample                                           population

                                    ex. Student Fee Increases

                                                -Attitudes regarding a three year phase-in of a fee increase

                                                -Population: current University students

                                                -Enrollment percentages based on yearà sample size of 50

                                                            -17 freshman, 13 sophomores, 11 juniors, 9 seniors

                                                            -Maintain ratios

Practical Considerations:

                                    -identifying all the members

                                    -Selection doesn’t ensure participation

                                                -Self-selection will introduce bias

                                    -Try to gather some information about the non-respondents so that                                      changes can be made in future recruitments

                                                -Incentives vs. Motivations

Non-Probability Sampling:




                                    -Haphazard or convenience

                        -Not representative but more realistic

                        **Low external validity

Cannot draw conclusions about the entire population


                                    1. Researchers set proportions in order to compare groups that may                                     differ greatly in frequency

                                    ex. CEO Percentages between men and women

                                            -Set minimum number required to distinguish between groups

                                    2. Researcher sets proportions to improve representativeness in a                                         nonrandom sample

Purposive Sampling:

                                    -Select specific subsets to compare

                                                -Not attempting to make generalizations

                                    -What makes a successful PSC41 student?

                                                -Not trying to generalize results to all UCD students or all                                                   college-age students

                                  -Select a (non-representative) sample that will exaggerate the effects

                                      ex. Comparing the characteristics of the top 10% VS. Bottom 10%

Snowball Sample:

                                    -Interested in a hard-to access population

                                                ex.People who hoard, coping with crohn’s disease

                                    *Ask each member you find to recommend additional members

Haphazard or Convenience Sample

                                    -Using what is convenient and available

                                    -Most research (Large universities)

                                                -Must fulfill participation requirements

Improving Representativeness of Non-Probabilistic Samples

                                    -Use systematic selection procedures to minimize selection bias

                                                *Quota, purposive and snowball will be better than                                                              convenience

                                                *Go get your data, don’t let it come to you

                                    -Sample from broad variety of times/places

                                                -Even if your sample is not random you can                                                                          (lessen) that bias if time and place are varied


                        -Interviewing every third person who enters the main door:


                        -Recruiting undergraduates who are enrolled in lower-division psychology                         classes likely produces a Convenience sample of the human race

                                    -Captive subject pool

Macintosh HD:Users:leo:Desktop:Screen Shot 2014-02-04 at 11.44.25 AM.Pdf

Module 5-1: Lecture 1-Correlational Research Tools (1/6/14)


            Bivariate correlational research

                        -Predictor and outcome variables are measured

            Correlation Coefficient r

                        -Direction of relationship

                        -Strength of relationship

            Linking hypotheses to r values

                        -What does the hypothesis tell us about the expected r value?

                        -Determining significance

Bivariate Correlational Research

            -Describe a relationship between two measured variables

                        -Neither variable is manipulated

            -Variables can be continuous or categorical

                        -X axis usually predictor

                        -Y axis usually Outcome

            *Correlation is not causation  (need to preform experiment)

            ex. Stroop Test vs. Food Eaten

                        -Prefrontal cortex uses high levels of glucose

                        -How does blood glucose level affect stroop                                                                                     performance scores



                        -Predictor: Calories consumed at lunchà x axis

                        -Outcome: time to accomplish stroop à y axis


                        -Results: Negative relationship (More calories consumedàless time                                    needed to accomplish test)


                        -Predictor: Did you eat lunch today? à x axis

                        -Outcome: Time to accomplish Stroop

                        *Bar graph

                        -Results: Longer time in “No” column

Continuous Variables:

                        -If predictor and outcome variables are measured continuously

                                    -Can plot the relative position of each participant on these two                                            continuous scales

                        *Can calculate a correlation coefficient

                                    -A number r that quantifies the nature and strength of that                                                   relationship

Categorical Variables:

                        -When one of the variables is categorical, it is better to display the data as                          a bar graph or histogram

                        *The values displayed in a bar graph are mean or average of all individual                           data points

The Correlation Coefficient:

            -Positive linear association

                        -Increase in one variable=Increase in another

            -Negative linear association

                        -Increase in one variable= decrease in another

            -Curvilinear Association

                        -Increases in one variable relate to both increases and decreases in another

            -No association

Correlation Coefficient (r)

Nature of the relationship

                                    -Positive, negative, no relationship

Strength of relationship

                                    -Correlation number between -1 and +1

                                    -Determined by absolute value of r

                                    *Graphically: Higher r = more densely clustered data

                                    -Higher number: Greater ability to predict one variable based on                                          another

                                    0 = no correlation

                                    .10 (-.10) = Small or weak correlation

                                    .30 (-.30) = Medium or moderate correlation

                                    .50 (-.50) = Large or strong correlation


Extreme Score

                                                -One or a few that stand away from the pack

                                                -Can have a strong effect on the r value

                                                -Matter most when the sample size is small

Linking Hypotheses to r values:

                        -What does the hypothesis tell us about the expected r value?

                        -Determining significance

Non-directional (two-tailed) Hypotheses:

                                    There is a relationship between X and Y

                                                -Null Hypothesis:

                                                            Ho: r = 0

                                                            HA: r not = 0

Directional (one-tailed hypotheses)

                                    Positive relationship:

                                                -Higher levels of X are associated with higher levels of Y

                                                            Ho: r = 0

                                                            HA: r > 0

                                    Negative Relationship:

                                                -Higher levels of X are associated with lower levels of Y

                                                            Ho: r = 0

                                                            HA: r<>


Is it Significant?

                                    -Statistical significance tells you the probability that the result is                                          due to chance

                                    -Decision rule for rejecting or failing to reject Ho

                                                *We are willing to accept 5% chance that you think you see                                                   a real relationship when none really exists à p<>

                                    -Compare calculated r to table of critical values

                                                -Degrees of freedom based on number of participants (n-2)

                                                -If observed r larger than critical value, conclude                                                                  significance


Effect Size:

                                    -How much of the variance does this explain?

                                                -r^2 = coefficient of determination

                                                            -Proportion of variance shared by the two variables

                                    ex. Child has behavior problems: r=.46, r^2 = .21

                                                -21% of variation in spanking was predicted by the child’s                                                   behavior problems

Interpreting a correlation

                                    -Correlation does not imply causation

                                    *If two variables are correlated we can conclude:

                                                -X is related to Y

                                                -Y is related to X

                                                -r^2 = % of variance in X is predicted by Y

                                    -It is impossible to tell if X caused Y,
Y caused X or if they are                                        both related to a third variable (spurious correlation)

Module 5-Lecture 2:

Interrogating Correlational Studies


                        Important validities to consider when assessing correlational studies:






*Construct Validity:

Measures what its supposed to measure

                        -Face Validity:

                                    -How well do the operational definitions measure each                                                        variable/construct

                        -Procedural Validity:

                                    -Was the procedure carried out well?

                                                -What was their population and sampling method?  

                                                -Was the number of subjects/observations high or low?

                                                -Were all subjects approached/treated equally?

                        -Method Validity:

                                    -Is this an appropriate way of measuring the construct?

                                                -Is the study using self-report when observations or                                                              physiological measurements would be better?

                        *Could any of measurements be biased? Are they reliable (test-retest, intra                         and inter-observer?)


Statistical Validity:

Were the differences in groups analyzed well?

                        -Drawing valid conclusions based on statistical analysis

                                    -Are researchers using appropriate statistical tests?

                                    -What is the r value? What is the effect size? P value?

                                                *p<.05àless than=”” 5%=”” chance=”” of=”” error=”” if=”” we=”” reject=”” the=”” null=””                                       =””            =””>àless>

                                    -Are they drawing conclusions based on significant/non-significant                                     results?

                                    -Are there any outliers

                                    -Subgroups in population?

                                    -No relationship or curvilinear relationship?


*External and Internal Validity:

Usually as one goes up-other goes down


Results generalize to real-world setting

                                    -Do the results generalize to other people/species, times and                                                 places?

                                    -Depends on how the population was defined and how subjects                                          were sampled


Changes in DV were caused by IV

                                    -Do the conclusions follow from the study?

                                    -Even if A and B correlate and A comes before B, we cannot                                              conclude that A caused B

                                    **Can’t eliminate other factors (third variables) in bivariate                                                 correlational studies because variables are measured not                                                       manipulated

                                    *If there is a plausible third variable: cannot infer causation

Module 6-1: Experimental Methods

-*Essential qualities of an experiment

            *Cause and effect criteria


                                    -The presumed cause and the presumed effect have to either both                                        occur or neither occur

                        -Time Sequence:

                                    -The presumed cause must precede the presumed effect

                        -Alternative causes must be ruled out

            *Definition of an experiment

                        -Start with equal groups

                        -Systematically manipulate the level of independent variable

                                    -Hold everything else constant

            -Construct, statistical, internal, and external validity

            -Random assignment

*True Experiments:

            -At least two groups compared

                        -There is a control and comparison condition

            -Groups should be equal in every way possible

1. Large Number of Participants:

                                    Random assignment (Not the same as random selection)

                                                -Every participant has equal chance of being placed in each                                      of the treatment groups

2. Small Number of Participants:


Matching procedure

                                                -Measure the potentially confounding difference

                                                -Pair similar participants together on that trait/variable

                                                -One member of each pair is randomly assigned to each                                                        group

                                                ex. Have all equal levels of IQ in each tested group

                        -Aim to eliminate individual differences

            -Independent variable is systematically manipulated

           -Dependent variable is measured and the group averages are compared statistically

-Types of experiments

Between Subject Group Design (Independent Groups)

                        -Different groups receive the different levels of independent variable

                        -Compare difference between groups



                                    -Post-test only

                        -Must use:

                                    -Participating in one condition makes it impossible to participate in                         another: Lasting effects of treatment, knowledge of research

                                    -Participant variables are preexisting characteristics of participants

                                                ex. Gender, mental health diagnosis

Within Subject Group Design (Repeated Measures)

                        -Each participant receives all levels of independent variable

                        -Each participant acts as their own control


                                    -Concurrent Measurement

                                                ex. Taste Coke and Pepsi: choose favorite

                                    -Repeated measures


                                    -Minimize number of subjects

                                    -Minimizes the effect of individual differences


                                    -Order effects-Previous experience or condition could effect                                               subsequent performance

                                                -Solution: Counterbalance

-Ruling out “everything else”

            *Extraneous Variables:

                        -Not related to treatment but could influence the outcome

                                    *Uncontrolled factors which are not of interest to the researcher

                        -Might just add unsystematic variability (noise)

                        *Might provide alternate explanation of results


Confounding Variable:

                                                -Systematically co-varies with independent variable

                                                            -Changes at the same time as independent variable

                                                            -Not just noise

                                                -Has effect that cannot be separated from the effects of                                                       independent variable

                                                            *Provides alternate explanation for the observed                                                                   change in dependent variable

            *Experimenter Effects

                        -Demand characteristics


                                    -Audio/video recording of instructions

                                    *Double-blind procedure: Participant and experimenter are not                                           aware of what group they are in

            *Participant Effects:

                        -Aim to reduce demand characteristics

                        -Use deception so that participants are unaware of the research hypothesis

                        -Placebo control group

                                    -To control for participant’s expectations

                        -Conduct manipulation check

                                    -Ask each participant what he/she thought the hypothesis was at                                         the end

**Threats to internal validity

            Maturation effect: Changes due to time

                        -Growing, fatigue, hunger

            Testing effect: Changes due to pre-test

                        -practice, expectations

            History effect: Changes due to events in the outside world

            Mortality effect: Differences due to some participants dropping out

            Selection effect: Differences due to unequal groups to begin with

Between Subjects Designs:

                        -Ensure that the groups are treated as similarly as possible

                                    -Maturation, testing, mortality

                        -Gather data rapidly/simultaneously

                                    -History effect

                        -Random assignment

                                    -Selection effect

Within Subject Designs:


                                    -Maturation effect

                                    -Testing effect

                        -Gather data rapidly/simultaneously

                                    -History effect

                        -Random assignment to counterbalance order

                                    -Selection effect

                        -If they drop out, they drop out of both conditions

                                    -Mortality effect

-What if independent variable does not make a difference?

            Not enough between-group difference

                        -Ineffective manipulation

                        -Insensitive measures

                        -Ceiling or floor effects (DV or IV)

            Too much variability within group members

                        -Measurement error

                        -Individual differences

                        -Situation variability/noise

*True and quasi-experiments


                        -Researchers cannot randomly assign the participants to the conditions

                                    -“looks like” an experiment but lacks equal groups

                                    -Participants are selected for the groups depending on the level of                                      predictor variable

                                    *There are often quasi-experimental results included in true                                                 experiments

True Experiments:

                        -Start with equal groups

                        -Vary one factor

                        -Measure and compare outcome variable

                        -Differences in DV must be caused by IV

                        -“Different levels of IV caused different performance on DV”

Quazi Experiments:

                        -Divide groups based on one factor

                                    -Groups not equal

                        -Measure and compare outcome variable

                        -Cannot conclude causality

                        -“The groups performed differently on the outcome measure”

Module 6-2: Factorial Designs

Factorial Designs:

*Experiments that have more than one independent variable

            -Allow one to investigate effects of more than one independent variable on            outcomes

            -Behavior is multi-caused

                        -Considering multiple IV’s makes our research more similar to real life

            -Sometimes effect is obscured when only looking at one variable

            -Sometimes effect is contingent upon the presence of both variables

            Examining for:

                        -Main effects of each independent variable

                        -Interactions between independent variables




                        -Independent variable

                        -Source of systematic (non-random) variance

                        Macintosh HD:Users:leo:Desktop:Screen Shot 2014-02-19 at 8.34.24 PM.Pdf

*ex. Diagnosis (mild depression, severe depression)x Treatment group (Group 1, Group 2, Group 3) x treatment duration (short, moderate, long)


            -3 main effects

            -3 (two way interactions)+1(threeway interaction)