ANOVA, Chi-Square, and Non-Parametric Statistics: A Comprehensive Guide
Between: each participant goes through a level of the IV. 2-way participants exposed to one level or both IVs, 3-way etc..
Within (repeated measures): all participants exposed to all levels and combinations
1-way ANOVA (1 iv and 1 outcome/dv) : null= means are the same (same as t-test) Exp /omnibus/=they differ
You can manipulate more than one independent variable
and is an extension of t-test.
Pairwise t-tests can’t look at several independent variables
and inflate the Type I error rate. (FW ErrorRate)
ass: rs Y,IoC N, Nor Y, Homo Y
SSTotal : Total variability.
SSTreatment: Variability due to the experimental manipulation.
SSError: Variability due to individual differences in performance
F-Ratio = MsTreat/MsError –> if F-ratio > fcritical = reject null
Eta2 = SStreat/SStotal (effect size) 0.01S,0.06M,0.14L
Omega2 = SStreat – (k-1)MSerror / (SStotal + MSerror) (effect size) 0.01S,0.06M,0.14L
Multiple Comparisons (after running your ANOVA)
Post hoc tests: Compare two treatments at a time (pairwise comparisons).
Control error rate familywise (FW).
Is appropriate only when you really don’t know what to
expect (exploratory research) – it’s a lot like fishing for
significance.
3Z. Wang
POST-HOC TEST types:
Tukey HSD test: Results in a
single number HSD to determines
statistical significance. It is
conservative.
Bonferroni’s method: Divide
by
the number of comparisons (c)
and require that each test be
significant at that level (.05/c) . It
is conservative.
Fisher’s LSD procedure: Results in
a single number HSD to
determines statistical significance.
It is liberal.
Dunnett’s C: Can be applied in
situations where the variances are
unequal.
ORTHOGONAL CONTRASTS
sum of weights are = 0 (vertical)
Contrast 1:Placebo ≠ (Low, High) /// Contrast 2: Low≠High
Trend Analysis
If you have ordered groups, you often will want to know
whether there is a consistent trend across the ordered groups
(e.g., linear trend).
2 -3 way ANOVA : 2 or more iv (each with multiple levels). 1dv
– levels for IVs are fixed (not continuous)
-random assignment of level and iv combinations
Example of a 2-Way ANOVA: (3×2)
Research Question: Does the type of teaching method and student background affect test scores?
Independent Variables:
Teaching Method (with levels such as Traditional, Online, Blended)
Student Background (with levels such as Urban, Rural)
Dependent Variable: Test scores (measured continuously)
Design: Each student is randomly assigned to one combination of Teaching Method and Student Background, and their test score is recorded.
Example of a 3-Way ANOVA: (3x3x2)
Research Question: How do exercise type, diet, and gender affect weight loss?
Independent Variables:
Exercise Type (with levels such as Cardio, Strength Training, No Exercise)
Diet (with levels such as Low Carb, High Protein, No Diet)
Gender (with levels such as Male, Female)
Dependent Variable: Weight loss (measured in pounds)
Design: Participants are randomly assigned to one of the combinations of Exercise Type, Diet, and Gender, and their weight loss is measured after a fixed period.
anova observed effects example: (2-way anova –> 2main effects)
Diet Type (Vegetarian, Non-Vegetarian) and Exercise Regimen (None, Light, Intensive) on weight loss (dv)
A main effect is the effect of an IV on the DV
averaged over the other variable or when the other variable
is ignored. (diet type on weight loss), can be misleading.
An interaction effect means the effect of one independent variable on the dependent variable differs
depending on the level of another independent variable. (diet type and exercise regimen)
A simple effect is the effect of one variable at a specific level
of another independent variable. (diet type on”light exercis”)
2-way anova between subjects summary table: (same calculations)
a= levels of iv 1, b = levels of iv 2. N = total sample size. K = # of levels of iv
3-way ANOVA
3-way ANOVA means that there are three IVs and one DV,
and includes:
Three main effects.
Three 2-way interactions and one 3-way interaction.
Recommendations: It is difficult to interpret the results of a 3-way interaction.
Here is the order when checking the results:
1. The 3-way interaction: If it is significant, you can ignore
others; if not, go to the next step.
2. The 2-way interactions: If it is significant, you can ignore
the main effects related to the interaction; if not, go to the
next step.
3. The main effects: If it is not significant, you can ignore the
main effect; if yes, examine the main effect.
Repeated measures Anova summary table:
Order effects (trial effects)
The order in which the participant receives a treatment (first,
second, etc.) will affect how the participant behaves.
It is a big problem with within-subjects designs.
Order effects may be due to:
1 Practice effects
After doing the dependent-measure task several times, a
participant’s performance may improve. In a within-
subjects design, this improvement might be incorrectly
attributed to having received a treatment.
2 Fatigue effects
Decreased performance on the dependent measure due to
being tired or less enthusiastic as the experiment
continues. Fatigue effects could be considered negative
practice effects.
3 Carry-over effects
The effects of a treatment administered earlier in the
experiment persist so long that they are present even
while participants are receiving additional treatments.
Carryover effects create problems for within-subjects
designs because you may believe that the participant’s
behaviour is due to the treatment just administered when,
in reality, the behaviour is due to the lingering effects of a
treatment administered some time earlier
4 Sensitization
After getting several different treatments and performing
the dependent-variable task several times, participants in
a within-subjects design may become sensitive to what the
hypothesis is.
5 Sequence effects
If participants who receive one sequence of treatments
score differently than those participants who receive the
treatments in a different sequence, there is a sequence
effect.
ANOVA 2-3-Way_Unequal Sample (between subjects design)
issue of which type of SS to use for unbalanced
designs is still controversial. Different texts and different
authors offer different recommendations.
In general, using Type III Sum of Squares (unweighted
mean) is the best and most common approach to analyze
unbalanced designs.
– Unequal sample size: you will not have SS total
total unweighted mean = (-0.5670 + 0.0102)/2 = -0.2784
Chi square test (2 types below)Ass: RS Y, IofO N, ExFr N
-Chi-square tests are utilized for analyzing categorical (nominal) data.
-The chi-square (�2χ2
) distribution’s shape is heavily influenced by the degrees of freedom (df).
-As df increases, the �2χ2
distribution becomes more symmetric.
-Both the mean and variance of the �2χ2
distribution are directly related to df; specifically, the
mean is equal to df and the variance is twice the df.
Goodness-of-Fit Test:
-This test evaluates how well the observed frequencies match expected frequencies according to a specific hypothesis.
-It’s often used for one-dimensional data to assess if a single categorical variable follows a distribution pattern.
Contingency Table Test (also known as Test of Independence):
-This test determines if there are significant associations between two categorical variables.
It’s conducted on a two-way table, where each cell represents the frequency for the combination of categories.
-Both tests involve a comparison of observed frequencies (O) with expected frequencies (E).
-Expected frequencies are based on the null hypothesis, assuming no relationship or difference exists,
or they can be derived from theoretical distributions relevant to the research question.
EFFECT SIZEs
d-family
Based on one or more measures of the differences between
groups or levels of the independent variable
r-family
Correlation coefficient between the two independent
variables
Phi
It represents the correlation between two variables and
applies only to 2×2 tables
Cramér’s V
Cramér extended phi to larger tables by defining V as the
following, where N is the sample size and k is defined as the
smaller of R and C. When k = 2 the two statistics, phi and V
are equivalent.
ANCOVA
-ANCOVA is used to test for differences between group means while controlling for the influence of extraneous
variables (ex: your dv is math performance BUT you also measured aptitude before and want to see if it affects as a covariate)
-It adjusts for the effects of confounding variables, providing a clearer picture of the relationship between
the independent and dependent variables.
-To control for the effect of the covariate, it is removed from the
achievement scores by using the regression method. Then the F test can
be performed on the adjusted achievement scores.
Advantages:
-Reduces error variance by accounting for variation in the dependent variable that’s due to extraneous
variables, not the main independent variable(s).
-Enhances experimental control and increases the validity of the results by adjusting
for confounders, allowing for a more accurate assessment of the primary relationships of interest
Conducting ANCOVA in SPSS:
1 covariate =1df (for it)
Check Assumptions:
-Begin by testing the assumption of homogeneity of regression slopes, also known as the
equal slopes assumption.
-This means ensuring that the relationship between the covariate and the dependent variable is
consistent across all levels of the independent variable(s).
Linearity:
Verify the linear relationship between the covariate (aptitude scores) and the dependent variable (achievement scores).
No Interaction:
Confirm there is no significant interaction between the covariate and the independent variable (teaching methods),
as ANCOVA is not appropriate in the presence of a significant covariate by treatment interaction
Mixed design ANOVA
– for Mixed design you need to have at least one between-subjects IV and one within-subjects IV
-ex: having a pretest and a postest (within) for 2 groups (between)
Ass: Nor Y, Homo Y, Inde N
Non parametric stats: (chi square test is one type)
The terms nonparametric and distribution-free, which have
slightly different meanings, are often used interchangeably.
When do we use nonparametric procedures?
When normality cannot be assumed.
When there is not sufficient sample size to assess the form of
the distribution
Rank stats:
OTHER PROCEDURES
Resampling
Bootstrapping: A statistical method for estimating the
sampling distribution of an estimator by sampling with
replacement from the original sample, most often with the
purpose of deriving robust estimates of standard errors and
confidence intervals of a population parameter like a mean
ADVANTAGES AND DISADVANTAGES
Advantages
Get a quick answer with little calculation.
Appropriate when the sample sizes are small.
Disadvantages
No parameters to describe and it becomes more difficult to
make quantitative statements about the actual difference
between populations.
Throw away information.
Less statistically powerful than their parametric counterparts
USE OF NONPARAMETRIC PROCEDURES
Each nonparametric procedure has its peculiar sensitivities
and blind spots. It is always advisable to run different
nonparametric tests.
Should discrepancies in the results occur contingent upon
which test is used, one should try to understand why some
tests give different resultsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Fundamental concepts and definitions:
Two branches of statistics:
Descriptive statistics
Summarize and describe a group of numbers.
Inferential statistics
Try to infer information about a population by using
information gathered by sampling.
Terms
Population: The complete set of data elements is termed the
population. Large or small; finite or infinite.
Sample: A sample is a portion of a population selected for further
analysis.
Parameter: A parameter is a characteristic of the whole population.
Statistic: A statistic is a characteristic of a sample, presumably
measurable.
Randomness
Assuming that the sample is truly random, we not only can
estimate certain characteristics of the population, but also
can have a very good idea of how accurate our estimates
are. To the extent that the sample is not random, our
estimates may or may not be meaningful, because the
sample may or may not accurately reflect the entire
population.
External validity: random selection
Internal validity: random assignment
STATISTICS & THREE COMMON RESEARCH METHODS
Experimental Method
Must satisfy requirements to be an experiment—high level of
control to isolate cause and effect; must manipulate levels of
an independent variable (the proposed cause); random
assignment renders groups equivalent ; comparison/control—
at least two groups observed (dependent variable—proposed
effect).
Quasi-Experimental Method
Including a non-manipulated IV (classification variable); lack
of random assignment to groups; and/or lack of control
(comparison) group (e.g., boys vs girls ).
Correlational Method
Variables measured as they naturally occur—lack of random
assignment and control to determine cause-effect
Nominal Scale
The nominal scale is the most basic level of measurement, where data can be categorized based on qualitative traits but not ordered. Categories on a nominal scale are mutually exclusive, and no order or ranking can be inferred from the labels.
Example: Categories like race, gender, or marital status are nominal. These categories simply denote different groups without any implied hierarchy or order.
Ordinal Scale
Data measured on an ordinal scale is categorized into groups that can be ranked; however, the intervals between the rankings are not necessarily equal. This means while you can say one rank is higher or lower, you can’t quantify the difference between the ranks.
Example: Educational levels (e.g., elementary school, high school, college, graduate school) are ordinal. They indicate a progression, but the difference in educational attainment between each level is not uniform.
Interval Scale
The interval scale is a numerical scale where the order of the numbers is meaningful, and the intervals between each value are equal. However, it does not have a true zero point, meaning the zero value does not imply the absence of the quantity being measured.
Example: Temperature in degrees Celsius or Fahrenheit is an interval scale because the difference between each degree is the same, but 0 degrees does not mean the absence of temperature.
Ratio Scale
The ratio scale possesses all the properties of an interval scale, and in addition, it has a true zero point. This true zero allows for a meaningful zero value (indicating none of the variable) and the calculation of ratios.
Example: Weight and height are measured on a ratio scale, as they have equal intervals, and a value of zero means there is no weight or height, respectively.
Boxplot
Describe the centre and
dispersion of datasets
and locate outliers.
Detect and illustrate
location and variation
changes between.
different groups of data
Assume that the data
are unimodal. Thus,
never claim that a
distribution is normal
simply on the basis of a
boxplot
The histogram graphically shows the following
Centre of the data
Spread of the data (kurtosis; heavy or light tailed)
Skewness of the data (symmetry)
Presence of outliers
Presence of multiple modes in the data (modality, uni-, bi-, or
multi-modal)
Normality
Outliers: correct them, ignore them, remove them, transform them, replace them
3 a probability theory
three approaches:
Classical (Analytic) Approach
The classical approach to probability is based on the assumption that the outcomes of an event are equally likely. This method calculates the probability of an event as the number of favorable outcomes divided by the total number of possible outcomes.
Example: Consider the rolling of two dice and summing the numbers on the top faces. If interested in calculating the probability of the sum being seven, you count how many combinations (e.g., (1,6), (2,5), (3,4), (4,3), (5,2), (6,1)) out of all possible combinations of two dice result in seven, giving you a probability of 6/36 or 1/6.
Frequentist Approach
The frequentist approach defines the probability of an event as the limit of its relative frequency after a large number of trials. This approach does not require any assumptions about the likelihood of individual outcomes but instead relies on actual experimental or observed data.
Example: If you flip a coin many times and observe that it lands heads 500 times out of 1,000 flips, you might estimate the probability of getting heads as 0.5, based on these empirical results.
Subjective Approach
The subjective approach to probability is based on personal judgment or belief about the likelihood of an event occurring. This approach is often used when it is not possible to apply classical or frequentist probability due to a lack of symmetry in outcomes or insufficient empirical data.
Example: If someone says, “I think there is a 70% chance that tomorrow will be a good day,” they are expressing a subjective probability based on personal belief, not on a statistical or empirical model.
Mutually exclusive or disjoint
If two events can not happen at the same
time. One event affects the other (e.g., tossing a coin twice : Event A =
two heads; Event B = two tails).
Probability rule—Addition Rule
If A and B are mutually exclusive or disjoint
events, then p(A or B) = p(A) + p(B). E.g., the
probability of drawing either a heart or a
spade from a deck of playing cards = 13/52 +
13/52 – 0/52 = 26/52.
If A and B are not mutually exclusive or
disjoint events, then p(A or B) = p (A) + p (B)
– p (A and B). E.g., the probability of
drawing a heart or a 3 from a deck of
playing cards = 13/52 + 4/52 – 1/52 = 16/52
INDEPENDENT
Independent
Events A and B are independent if knowing that A occurs
does not affect the probability that B occurs (e.g., tossing two
coins: Event A = the first coin is a head; Event B = the second
coin is a head).
Probability rule—AND/Multiplication Rule
If A and B are independent, then p(A and B) = p(A)*p(B).
Joint probability
In N independent trials, suppose NA, NB, NAB denote the
number of times events A, B and AB occur respectively.
According to the frequency interpretation of probability,
for large N. p(A) = NA/N, p(B) = NB/N, p(A and B) = NAB/N
Disjoint events cannot be independent! If A and B can not occur together
(disjoint), then knowing that A occurs does change probability that B occurs.
Normal distribution is theoretical.
The theoretical distribution with data that are symmetrically distributed
around the mean, median, and mode.
Bell-shaped and unimodal
Mean = median = mode
Asymptotic tails
Area under the curve = 1
Defined by Two Parameters: The shape of the normal distribution is determined by two parameters—
mean (μ) and standard deviation (σ).
-The mean determines the center of the distribution, and the standard deviation
determines the spread or width of the distribution
-The theoretical distribution of the hypothetical set of
sample means obtained by drawing an infinite number of
samples from a specified population can be shown to be
approximately normal under a wide variety of conditions.
Such a distribution is called the sampling distribution of
the mean
Z score
Why use Z scores?
Can compare distributions with different means and
standard deviations.
Used to “STANDARDIZE” variables – allows variables to be
compared to one another even when measured on different
scales.
Bill has an IQ of 145, IQ in the population has a mean of 100 and a
standard deviation of 15
145-100 / 15 = 3 (bills iq score is 3 std higher than mean score)
modality: how any peaks (first 2) (can be assesed with qq plots)
skewness: positive or negative (3 and 4)
kurtosis: deviation form the normal distribution
(last 2 /platykurtic – scores more in the middle and leptokurtic-more peak and longer tails/)
Normality (first graph): if you divide kurtisis and skew in spss by the std error next to them, if score is > 0.05 then not normal
SAMPLING & HYPOTHESIS TESTING
CONTROVERSY
The standard error of mean is the standard deviation of the
distribution of sample means
Type 1 error: false positive (mistakenly reject H0)
Type 2 : false negative (fail to reject H0)
Here are factors that affect the power of a test:
the size of alpha
1 or 2 tailed test
separation of mean0 and mean1 (Effect Size?)
the size of the population variance,
the sample size, n
Unified approach to null hypothesis testing
Point estimation
Use a sample statistic (e.g., a sample mean) to estimate a
population parameter (e.g., a population mean).
Advantage: It is an unbiased estimator, that is, the sample mean
will equal the population mean on average.
Disadvantage: Have no way of knowing for sure whether a
sample mean equals the population mean. For this reason,
researchers often report a point estimate with an interval
estimate.
Interval estimation
Confidence interval (CI)—Interval or range of possible values
within which an unknown population parameter is likely to
be contained.
Level of confidence—Probability or likelihood that an interval
estimate will contain an unknown population parameter (e.g.,
population mean) (e.g., 95% CI [2.35 – 27.65])
-If the value specified in Ho fell outside the confident interval,
H0 should be rejected (otherwise it is kept)
T-tests
When std (weird circle with tail) is known
One sample z test
When std is not known
One sample t test (IofOb N, Nor Y, RanS Y)
Purpose: Compare a sample mean to a known population mean when population variance is unknown.
Theory: Assumes the sample is drawn from a normally distributed population.
Independent samples t test (IofOb N +Homo Y)
Purpose: Compare means between two independent groups.
Theory: Assumes both samples are from normally distributed populations.
Uses the pooled standard deviation if variances are assumed equal, with t-statistic calculated
Dependent samples t test (IofOb N, + Corr N)
Purpose: Compare means between two related groups (matched pairs or repeated measures).
Theory: Assumes differences within pairs are normally distributed. Calculates differences for each pair, then computes the t-statistic
GLM
Ass: InofCases N, Lin N, Nor Y, RanS Y
FACTORS INFLUENCING THE PEARSON R: Linearity and Outliers, Restriction of range
Simple and logistic regressions:
Logistic Regression
-Purpose: Used to predict the probability of a categorical dependent variable based on one or more predictor variables.
This method is suitable when the outcome variable is binary or multinomial (e.g., success/failure, yes/no).
-Model: The relationship between predictors and the log odds of the outcomes is modeled using a logistic function, which is non-linear.
Key Metrics:
Odds Ratio (Exp(B)): Represents the change in odds for a unit increase in the predictor.
Wald Statistic: Tests the significance of each coefficient.
Nagelkerke R2: Measures the proportion of variance explained by the model.
Application: Common in medical fields for binary outcomes, market research for choice modeling, and any field where outcomes are categorical.
Simple Regression
Purpose: Used to predict a continuous dependent variable from a single predictor variable.
Model: Assumes a linear relationship between the predictor and the outcome.
Key Metrics:
R-squared (r²): Represents the proportion of variance in the dependent variable that is predictable from the independent variable.
Regression Coefficient (B): Indicates the change in the dependent variable for a one-unit change in the independent variable.
Standardized Coefficients: Provide a measure of the strength of the relationship between the variables, expressed in standard deviation units.
Application: Widely used in economics, business, and social sciences to understand relationships between variables and for forecasting.
Key Differences
Outcome Variable: Logistic regression is used for categorical outcomes, whereas simple regression is used for continuous outcomes.
Function Shape: Logistic regression uses a logistic curve, non-linear, to model the probability of an event. In contrast, simple regression uses a straight line to model the relationship.
Interpretation of Coefficients: In logistic regression, coefficients represent the change in log odds per unit change in the predictor,
while in simple regression, coefficients represent the change in the outcome variable per unit change in the predictor.