Research Methods: Validity, Reliability, and Experimental Design

*Valid logic: True premises always lead to a true conclusion. Sound logic: The argument is valid and all of the premises are true

Hypotheses:


 

Directional alternative hypothesis: –

Group 1 will perform better than Group 2 -As X goes up, so does Y -As X goes up, Y goes down

Causal or Associative Hypotheses — –


Some attempt to define a causal relationship -The experimenter must be in control of the cause -“If I change X, then I think that Y will also change” …-Co-Occurrence: -The presumed cause and presumed effect have to either both occur or neither occur Time Sequence: -The presumed cause must precede the presumed effect -Alternative causes must be ruled out

-Some just describe associations and predict outcomes -The experimenter is not in control of either variable
“These groups respond differently”


-I wonder if x is related to y

-X is the predictor variable (participant variable)
A pre-existing characteristic used to classify subjects  -Y is the outcome variable (dependent)
Outcome, response, measured

Participant Variables


: -Preexisting characteristics -Individual differences of participants -Not under the experimenter’s control -Participants are grouped according to their level or value 

Correlational Research– –


Two measured variables -Searching for an association -ex. Behavioral correlation: Fidgeting in class based on location -Frequency Claim: How many students fidget -Association Claim: Do students who sit in the front of the room fidget less than those who chose to sit in the back of the room? *Cannot assess a Causal claim -We did not start with equal groups -Did not manipulate anything     

Experimental Research: –


One manipulated variable, observe one or more measured variable -Searching for causation -Start with two equal groups -Random assignment to groups *Hold all variables constant

Validity: Accuracy or correctness — –


Soundly based on facts or evidence -Genuine, credible, true -Can be very narrow -Applied to particular variable -Can be very broad -Applied to an entire study or program of research -Not directly measurable -Evaluated based on logical and reasoning

Reliability: Consistency —


-Repeat, replicate -If the characteristic being measured is stable, a reliable test yields consistent results – The extent to which the test is consistent in its evaluation of the same individual (or their physiology) over repeated administrations -Can the variables be measured reliably. Measuring Reliability: -Calculate correlations to assess reliability:
-Test-retest -Split-half -Parallel or alternate forms -Item-Total correlation -Coefficient Alpha (Cronbach’s)

Construct -Hypothetical variable, which is not directly observable ex. Intelligence, happiness, introversion/extroversion -Estimate the participant’s level of the construct by asking multiple questions -Operational definition is VERY important

Construct Validity:


-How well are the variables constructed? -How well did the researchers measure the variables? *Is the operational definition accurate? -Are you measuring what you think you’re measuring?

Face Validity: -Does it appear to measure the construct? Procedure: -Was the procedure carried out well? Method: -Is this an appropriate way of measuring the construct? Ex. Intelligence: Measuring size of head -Reliable (everyone has a head) *Not valid       

Internal and External Validity:


*Generally experiments have high internal validity but low external validity
Internal Validity: -Do the conclusion follow from the study? -Logical based on data? -Was the study conducted well? -Were alternate explanations ruled out? -Were all potential confounds eliminated? -External Validity: Generalizability -Does this study reflect what typically happens in the world? *Do the results generalize to other situations -Do the results generalize to all people or across related species?

**Internal up= External down

Statistical Validity: *Type I Error: Seeing a relationship that isn’t there *Type II Error: Missing a relationship that is there (False negative)

**If a measure is valid it must also be reliable -Reliability is necessary for validity
Reliability not sufficient for validity                                    

 Interrogating Frequency Claims:
Construct validity of the variable -How well was the variable measured -quality of measurements *External validity is essential -Can we generalize from sample to population?

-Nominal: -Naming scales -Categorical variables -Property of identity -Not measured with inherent order -Any number coding is arbitrary ex. Male/Female, Cat/Dog CATEGORICAL

-Ordinal: -Property of identity -Property of magnitude -Levels of variable are arranged in an order -Higher levels represent more of the variable than do lower levels ex. 1st, 2nd, 3rd CATEGORICAL

-Interval: -Property of identity -Property of magnitude -Property of equal intervals *No true zero ex. IQ tests (CONTINUOUS)

-Ratio: -Property of identity -Property of magnitude -Property of Equal intervals -Property of True zero *Allows ratios to be made ex. Scores on an exam, weight  (CONTINUOUS)


-Identity: Each number has particular meaning -Magnitude: Numbers have an inherent order -Equal intervals: Difference between units is the same anywhere on the scale -True zero

Surveys

                        -Open-ended -Gives the information in the participant’s own words -Can be more open to different (wrong) interpretations *Can be very hard to compare across participants

                        -Closed (Forced-choice) -More common -Forces a choice between limited options *Easier to compare answers between participants ex. Multiple choice…Problems: NO OVERLAP, DONT SKIP GROUPS, ALL CATEGORIES INCLUDED -Forced-alternative questions…Two extreme options, force choice…

Construct Validity

How well was the variable measured? -What (else) influenced the answers given? -Were the questions worded clearly? -Avoid response sets (switch it up, acquiescence, reverse-wording/coding) -Clear layout (easy to answer) -How was the survey distributed?   -Were other people around?                              

                        -Avoid leading questions, double-barreled questions, social desirability bias, long questions, negation, big words, keep it simple….. *The best way to know that your survey is effective is through pretesting

                        *

Limitations of self-report data:

                                    -Surveys are appropriate for measuring opinions and attitudes,                                          not behaviors

                                                -Awareness of the information

                                                -Access to the information-forgetting

                                                *General rule-don’t ask if you can access the data some                                                       other way

                                    –

Cannot determine causality

                                                -We are unaware of why we do things

Observations:


            -Construct validity

                        *Threats to construct validity

                                    -Observers may see what they expect to see

                                                *

Solution:

Make observations more objective/accurate by:

                                                            -Recording video and audio whenever possible

                                                            -Using checklist or ethogram

                                                            -Time sampling

                                                            -Inter and intra-rater

                                                            -Blind observer

                                    -Observers can affect what they see

                                                -Self-fulfilling prophecy

                                                *

Solution:

Blind Observers

                                                            -Unaware of the hypothesis of the study

                                                            -Unaware of which group each participant is in

                                    -Observer effects

                                                -Being observed influences behavior

                                                –

Solution:

                                                            -Be unobtrusive: Hide

                                                            -Allow time to acclimate

Key Points:


The Design of surveys or questionnaires

-Can access internal thoughts, feelings, beliefs

                        -Types of questions

                        -What makes a ‘good’ question

                        -Threats to construct validity

The design of observational studies


                        –

Can observe current actions/performance or access archival records

                        -How we control observational studies

                        -How we control participants’ reactions to observers

-Types of observations

            -Observing actions

                        -Naturalistic (natural setting)

                        -Systematic-with a specific research purpose

            -Observing performance

                        -Performance on tests or tasks

            -Observing archives

                        -Information that was documented for other purposes

                        -Content analysis-analyzing archives 

Lecture 4-2:


Sampling and Generalizing Large Populations (2/4/14)

            Topics:

                        –

Introduction and rationale

                                    -Generalizability (external validity)

                        –

Two sources of error

                                    -Chance and bias

                       

-Probability samples (representative)

                                    -Simple random, cluster/multistage, and stratified

                        –

Nonprobability samples

                                    -Quota, purpose, and convenience

External Validity:


                        -Generalizability

                        -Do the results apply to:

                                    -The entire population

                                    -Other demographics

                                    -Other settings

                                    -Other situations

                        *Research conducted with a representative sample has much higher                                    external validity

Sampling:


                        -You are interested in a particular population

                                    ànot able to talk to everyone (census)

                        -Take sample from population

                        *Findings only relate to entire population if sample is representative

                        ex. Literary Digest (1936)

                                    -Ran polls before presidential elections

                                    – Sent out 10 million à  2.4 million returned

                                    -Sample Digest drawn from

                                                -Magazine subscribers

                                                -Telephone directories

                                                -Car registration

                                    -Literary Digest results: Landon winning (57%)

                                    *Actual results: Landon losing (38%)

                                    *Sample was biased towards wealthier/right-leaning population

Error in Sampling:


                                   *Error is the extent to which the sample differs from the population

Sampling error:


                                                -No two samples will be the same; there will be variation

                                                *Random

                                                ex. Mixed colors population 

                                                       *Lessen sample error by increasing number of sampled

                                                *

Dealing with sample error:

                                                            -Increase sample size

                                                                        -The larger your sample size, the closer you                                                                           will be to your population

                                                                        -Margin of error decreases the larger the                                                                                sample population (1,000-2,000)

                                                            -Estimate using statistics

                                                                          -Standard error- how much random error in                                                                            measurement

Sampling Bias:


                                                -Sample is selected systematically

                                                *Non-random

                                                *Constant error

                                                            -evil, wicked, wrong

                                                            -Caused by a non-random selection procedure

                                                                        -Selected based on ease or any other                                                                                      systematic difference

                                                                        -Self-selection, convenience samples

                                                            -Results in a sample that is not representative of the                                                              population

                                                            *Reduces external validity

                                                                        -Not able to draw conclusions about the                                                                                population

Probability Samples (Representative):


                        -Simple Random Sampling

                        -Cluster/multistage 

                        -Stratified sampling

                        –

Simple Random Sampling:

1. Identify all members of the population

2. Every member has an equal likelihood of being selected

                                                -Sample will be representative of the population

                                                *You will be able to draw conclusions about the entire                                                         population

                                    *Very high external validity

Cluster/Multistage Sampling:


-Advantage: Can accomplish representativeness with a smaller                                             random sample

                                    -WILL be able to draw conclusions about the rest of the population

Cluster:


                                    -Randomly select subpopulations and include all the individuals                                           within each in the study

                                    -Difficult to accomplish

                                                ex. Interview all voters in certain districts

Multistage:


                                    -Randomly select subpopulations and include a random sample of                                      individuals in the subpopulation in the study

Stratified Sampling

                                    -Divide population into subpopulations

                                                ex. Demographic groups, education level

                                    -Randomly select proportional samples from each subpopulation

                                    –

Advantage

                                                -Can accomplish representativeness with a smaller random                                                   sample

                                    -Able to draw conclusions about the rest of the population

                                    *Ratios remain constant between whole population and sample                                           population

                                    ex. Student Fee Increases

                                                -Attitudes regarding a three year phase-in of a fee increase

                                                -Population: current University students

                                                -Enrollment percentages based on yearà sample size of 50

                                                            -17 freshman, 13 sophomores, 11 juniors, 9 seniors

                                                            -Maintain ratios

Practical Considerations:


                                    -identifying all the members

                                    -Selection doesn’t ensure participation

                                                -Self-selection will introduce bias

                                    -Try to gather some information about the non-respondents so that                                      changes can be made in future recruitments

                                                -Incentives vs. Motivations

Non-Probability Sampling:


                                    -Quota

                                    -Purposive

                                    -Snowball

                                    -Haphazard or convenience

                        -Not representative but more realistic

                        **Low external validity

                                    –
Cannot draw conclusions about the entire population

Quota:


                                    1. Researchers set proportions in order to compare groups that may                                     differ greatly in frequency

                                    ex. CEO Percentages between men and women

                                            -Set minimum number required to distinguish between groups

                                    2. Researcher sets proportions to improve representativeness in a                                         nonrandom sample

Purposive Sampling:


                                    -Select specific subsets to compare

                                                -Not attempting to make generalizations

                                    -What makes a successful PSC41 student?

                                                -Not trying to generalize results to all UCD students or all                                                   college-age students

                                  -Select a (non-representative) sample that will exaggerate the effects

                                      ex. Comparing the characteristics of the top 10% VS. Bottom 10%

Snowball Sample:


                                    -Interested in a hard-to access population

                                                ex.People who hoard, coping with crohn’s disease

                                    *Ask each member you find to recommend additional members

Haphazard or Convenience Sample

                                    -Using what is convenient and available

                                    -Most research (Large universities)

                                                -Must fulfill participation requirements

Improving Representativeness of Non-Probabilistic Samples

                                    -Use systematic selection procedures to minimize selection bias

                                                *Quota, purposive and snowball will be better than                                                              convenience

                                                *Go get your data, don’t let it come to you

                                    -Sample from broad variety of times/places

                                                -Even if your sample is not random you can                                                                          (lessen) that bias if time and place are varied

Clickers:


                        -Interviewing every third person who enters the main door:

Convenience

                        -Recruiting undergraduates who are enrolled in lower-division psychology                         classes likely produces a Convenience sample of the human race

                                    -Captive subject pool

Macintosh HD:Users:leo:Desktop:Screen Shot 2014-02-04 at 11.44.25 AM.Pdf

Module 5-1: Lecture 1-Correlational Research Tools (1/6/14)

Topics:

            Bivariate correlational research

                        -Predictor and outcome variables are measured

            Correlation Coefficient r

                        -Direction of relationship

                        -Strength of relationship

            Linking hypotheses to r values

                        -What does the hypothesis tell us about the expected r value?

                        -Determining significance

Bivariate Correlational Research

            -Describe a relationship between two measured variables

                        -Neither variable is manipulated

            -Variables can be continuous or categorical

                        -X axis usually predictor

                        -Y axis usually Outcome

            *Correlation is not causation  (need to preform experiment)

            ex. Stroop Test vs. Food Eaten

                        -Prefrontal cortex uses high levels of glucose

                        -How does blood glucose level affect stroop                                                                                     performance scores

Continuous:


 

                        -Predictor: Calories consumed at lunchà x axis

                        -Outcome: time to accomplish stroop à y axis

                        *Scatterplot

                        -Results: Negative relationship (More calories consumedàless time                                    needed to accomplish test)

Categorical:


                        -Predictor: Did you eat lunch today? à x axis

                        -Outcome: Time to accomplish Stroop

                        *Bar graph

                        -Results: Longer time in “No” column

Continuous Variables:


                        -If predictor and outcome variables are measured continuously

                                    -Can plot the relative position of each participant on these two                                            continuous scales

                        *Can calculate a correlation coefficient

                                    -A number r that quantifies the nature and strength of that                                                   relationship

Categorical Variables:


                        -When one of the variables is categorical, it is better to display the data as                          a bar graph or histogram

                        *The values displayed in a bar graph are mean or average of all individual                           data points

The Correlation Coefficient:


            -Positive linear association

                        -Increase in one variable=Increase in another

            -Negative linear association

                        -Increase in one variable= decrease in another

            -Curvilinear Association

                        -Increases in one variable relate to both increases and decreases in another

            -No association

Correlation Coefficient (r)



Nature of the relationship

                                    -Positive, negative, no relationship

                        –
Strength of relationship

                                    -Correlation number between -1 and +1

                                    -Determined by absolute value of r

                                    *Graphically: Higher r = more densely clustered data

                                    -Higher number: Greater ability to predict one variable based on                                          another

                                    0 = no correlation

                                    .10 (-.10) = Small or weak correlation

                                    .30 (-.30) = Medium or moderate correlation

                                    .50 (-.50) = Large or strong correlation

Outliers:


Extreme Score

                                                -One or a few that stand away from the pack

                                                -Can have a strong effect on the r value

                                                -Matter most when the sample size is small

Linking Hypotheses to r values:


                        -What does the hypothesis tell us about the expected r value?

                        -Determining significance

Non-directional (two-tailed) Hypotheses:


                                    There is a relationship between X and Y

                                                -Null Hypothesis:

                                                            Ho: r = 0

                                                            HA: r not = 0

Directional (one-tailed hypotheses)


                                    Positive relationship:

                                                -Higher levels of X are associated with higher levels of Y

                                                            Ho: r = 0

                                                            HA: r > 0

                                    Negative Relationship:

                                                -Higher levels of X are associated with lower levels of Y

                                                            Ho: r = 0

                                                            HA: r<>

                        **

Is it Significant?

                                    -Statistical significance tells you the probability that the result is                                          due to chance

                                    -Decision rule for rejecting or failing to reject Ho

                                                *We are willing to accept 5% chance that you think you see                                                   a real relationship when none really exists à p<>

                                    -Compare calculated r to table of critical values

                                                -Degrees of freedom based on number of participants (n-2)

                                                -If observed r larger than critical value, conclude                                                                  significance

                        *

Effect Size:

                                    -How much of the variance does this explain?

                                                -r^2 = coefficient of determination

                                                            -Proportion of variance shared by the two variables

                                    ex. Child has behavior problems: r=.46, r^2 = .21

                                                -21% of variation in spanking was predicted by the child’s                                                   behavior problems

Interpreting a correlation

                                    -Correlation does not imply causation

                                    *If two variables are correlated we can conclude:

                                                -X is related to Y

                                                -Y is related to X

                                                -r^2 = % of variance in X is predicted by Y

                                    -It is impossible to tell if X caused Y,
Y caused X or if they are                                        both related to a third variable (spurious correlation)

Module 5-Lecture 2:


Interrogating Correlational Studies

            Topics:

                        Important validities to consider when assessing correlational studies:

                                    -Construct

                                    -Statistical

                                    -External

                                    -Internal

           

*Construct Validity:

Measures what its supposed to measure

                        -Face Validity:

                                    -How well do the operational definitions measure each                                                        variable/construct

                        -Procedural Validity:

                                    -Was the procedure carried out well?

                                                -What was their population and sampling method?  

                                                -Was the number of subjects/observations high or low?

                                                -Were all subjects approached/treated equally?

                        -Method Validity:

                                    -Is this an appropriate way of measuring the construct?

                                                -Is the study using self-report when observations or                                                              physiological measurements would be better?

                        *Could any of measurements be biased? Are they reliable (test-retest, intra                         and inter-observer?)

            *

Statistical Validity:

Were the differences in groups analyzed well?

                        -Drawing valid conclusions based on statistical analysis

                                    -Are researchers using appropriate statistical tests?

                                    -What is the r value? What is the effect size? P value?

                                                *p<.05àless than=”” 5%=”” chance=”” of=”” error=”” if=”” we=”” reject=”” the=”” null=””                                       =””            =””>àless>

                                    -Are they drawing conclusions based on significant/non-significant                                     results?

                                    -Are there any outliers

                                    -Subgroups in population?

                                    -No relationship or curvilinear relationship?

           

*External and Internal Validity:

Usually as one goes up-other goes down

External:


Results generalize to real-world setting

                                    -Do the results generalize to other people/species, times and                                                 places?

                                    -Depends on how the population was defined and how subjects                                          were sampled

Internal:


Changes in DV were caused by IV

                                    -Do the conclusions follow from the study?

                                    -Even if A and B correlate and A comes before B, we cannot                                              conclude that A caused B

                                    **Can’t eliminate other factors (third variables) in bivariate                                                 correlational studies because variables are measured not                                                       manipulated

                                    *If there is a plausible third variable: cannot infer causation

Module 6-1: Experimental Methods


-*Essential qualities of an experiment

            *Cause and effect criteria

                        -Co-Occurrence:

                                    -The presumed cause and the presumed effect have to either both                                        occur or neither occur

                        -Time Sequence:

                                    -The presumed cause must precede the presumed effect

                        -Alternative causes must be ruled out

            *Definition of an experiment

                        -Start with equal groups

                        -Systematically manipulate the level of independent variable

                                    -Hold everything else constant

            -Construct, statistical, internal, and external validity

            -Random assignment


*True Experiments:

            -At least two groups compared

                        -There is a control and comparison condition

            -Groups should be equal in every way possible

1. Large Number of Participants:

                                    Random assignment (Not the same as random selection)

                                                -Every participant has equal chance of being placed in each                                      of the treatment groups

2. Small Number of Participants:

                                    **

Matching procedure

                                                -Measure the potentially confounding difference

                                                -Pair similar participants together on that trait/variable

                                                -One member of each pair is randomly assigned to each                                                        group

                                                ex. Have all equal levels of IQ in each tested group

                        -Aim to eliminate individual differences

            -Independent variable is systematically manipulated

           -Dependent variable is measured and the group averages are compared statistically


-Types of experiments

Between Subject Group Design (Independent Groups)


                        -Different groups receive the different levels of independent variable

                        -Compare difference between groups

                        -Types:

                                    -Pre-test/Post-test

                                    -Post-test only

                        -Must use:

                                    -Participating in one condition makes it impossible to participate in                         another: Lasting effects of treatment, knowledge of research

                                    -Participant variables are preexisting characteristics of participants

                                                ex. Gender, mental health diagnosis

Within Subject Group Design (Repeated Measures)


                        -Each participant receives all levels of independent variable

                        -Each participant acts as their own control

                        -Types:

                                    -Concurrent Measurement

                                                ex. Taste Coke and Pepsi: choose favorite

                                    -Repeated measures

Advantages:


                                    -Minimize number of subjects

                                    -Minimizes the effect of individual differences

Disadvantages:


                                    -Order effects-Previous experience or condition could effect                                               subsequent performance

                                                -Solution: Counterbalance


-Ruling out “everything else”

            *Extraneous Variables:

                        -Not related to treatment but could influence the outcome

                                    *Uncontrolled factors which are not of interest to the researcher

                        -Might just add unsystematic variability (noise)

                        *Might provide alternate explanation of results

                                    **

Confounding Variable:

                                                -Systematically co-varies with independent variable

                                                            -Changes at the same time as independent variable

                                                            -Not just noise

                                                -Has effect that cannot be separated from the effects of                                                       independent variable

                                                            *Provides alternate explanation for the observed                                                                   change in dependent variable

            *Experimenter Effects

                        -Demand characteristics

                        -Solutions:

                                    -Audio/video recording of instructions

                                    *Double-blind procedure: Participant and experimenter are not                                           aware of what group they are in

            *Participant Effects:

                        -Aim to reduce demand characteristics

                        -Use deception so that participants are unaware of the research hypothesis

                        -Placebo control group

                                    -To control for participant’s expectations

                        -Conduct manipulation check

                                    -Ask each participant what he/she thought the hypothesis was at                                         the end


**Threats to internal validity

            Maturation effect: Changes due to time

                        -Growing, fatigue, hunger

            Testing effect: Changes due to pre-test

                        -practice, expectations

            History effect: Changes due to events in the outside world

            Mortality effect: Differences due to some participants dropping out

            Selection effect: Differences due to unequal groups to begin with

Between Subjects Designs:


                        -Ensure that the groups are treated as similarly as possible

                                    -Maturation, testing, mortality

                        -Gather data rapidly/simultaneously

                                    -History effect

                        -Random assignment

                                    -Selection effect

Within Subject Designs:


                        -Counterbalance:

                                    -Maturation effect

                                    -Testing effect

                        -Gather data rapidly/simultaneously

                                    -History effect

                        -Random assignment to counterbalance order

                                    -Selection effect

                        -If they drop out, they drop out of both conditions

                                    -Mortality effect


-What if independent variable does not make a difference?

            Not enough between-group difference

                        -Ineffective manipulation

                        -Insensitive measures

                        -Ceiling or floor effects (DV or IV)

            Too much variability within group members

                        -Measurement error

                        -Individual differences

                        -Situation variability/noise


*True and quasi-experiments

Quasi-Experiment:

                        -Researchers cannot randomly assign the participants to the conditions

                                    -“looks like” an experiment but lacks equal groups

                                    -Participants are selected for the groups depending on the level of                                      predictor variable

                                    *There are often quasi-experimental results included in true                                                 experiments

True Experiments:


                        -Start with equal groups

                        -Vary one factor

                        -Measure and compare outcome variable

                        -Differences in DV must be caused by IV

                        -“Different levels of IV caused different performance on DV”

Quazi Experiments:


                        -Divide groups based on one factor

                                    -Groups not equal

                        -Measure and compare outcome variable

                        -Cannot conclude causality

                        -“The groups performed differently on the outcome measure”

Module 6-2: Factorial Designs

Factorial Designs:


*Experiments that have more than one independent variable

            -Allow one to investigate effects of more than one independent variable on            outcomes

            -Behavior is multi-caused

                        -Considering multiple IV’s makes our research more similar to real life

            -Sometimes effect is obscured when only looking at one variable

            -Sometimes effect is contingent upon the presence of both variables

            Examining for:

                        -Main effects of each independent variable

                        -Interactions between independent variables

            –

Factor:

                        -Cause

                        -Independent variable

                        -Source of systematic (non-random) variance

                        Macintosh HD:Users:leo:Desktop:Screen Shot 2014-02-19 at 8.34.24 PM.Pdf

*ex. Diagnosis (mild depression, severe depression)x Treatment group (Group 1, Group 2, Group 3) x treatment duration (short, moderate, long)

            -2x3x3

            -3 main effects

            -3 (two way interactions)+1(threeway interaction)