Statistical Concepts: A Comprehensive Guide to Measures, Tests, and Relationships

Measures of Central Tendency and Variability

Nominal, Ordinal, Interval, and Ratio Scales

Nominal = qualitative number, categories, no numerical relationship between categories, Ordinal = ranking of categories, do not know how much greater each category is, Interval = continuous (magnitude difference between two values can be determined), placement of zero is arbitrary (e.g., Celsius), Ratio = continuous, zero has a natural interpretation

Sampling Distribution

A sampling distribution is a probability distribution of a statistic (e.g., mean) that is obtained through repeated sampling of a population – It describes the range of possible outcomes for a statistic (e.g., the sampling distribution of means describes the variation in the values of the mean over a series of samples – The distribution of means will follow a normal distribution – The larger the sample size, the more tightly clustered the distribution of means will be around the population mean)

Standard Error

The standard error ) of a statistic is the standard deviation (or estimate) of its sampling distribution – Population or sample SD (σ)/√n – It indicates how different the population mean is likely to be from the sample mean – Estimates variability across samples of a population – The larger the sample size, the smaller the SE

Standard Deviation

The standard deviation is the square root of the variance (the average squared distance from the mean value = ∑(X – x̄)2/n) and tells us the average deviation from the mean – Reflects variability within a sample

Confidence Intervals

Confidence intervals use the standard error to give us a range for the mean (confidence limits) and the probability that the population mean is within that range – 95% CI = x̄ ±(zcritical* σ

Hypothesis Testing

Steps of Hypothesis Testing

State H0 → State H1 → Determine significance level (critical value of test statistic or the associated p-value) →  Calculate your test statistic (e.g., t-test)  and the p-value associated with it (e.g., if p

Type 1 Error

Type 1 error = false positive

ANOVA

ANOVA (Fisher)

ANOVA (Fisher) = mathematically equivalent extension of the t-test that gives you an F-score (which cannot be negative), assesses group differences (nominal variable) in means (continuous variable), IV = 2+ categorical groups, DV = continuous variable,  used when you want to compare 2+ groups (k = # of groups) on a continuous variable (as opposed to conducting multiple t-tests which only compare 2 groups at a time, as this increases Type I error)

Familywise Error Rate

Familywise error rate = if error rate is set at 5%, you add a 5% chance of error with every t-test you run – can be managed w/ adjustments to error rate (e.g., post-hoc tests – may increase Type II error), by focusing on effect size and meta-analysis, or by interpreting your findings w/ a grain of salt

Partitioning Variance

ANOVA compares 3 things: differences between means, differences in values within samples, differences in values across samples – While t-tests compare the variation between 2 samples to the variation within 2 samples, ANOVA compares the between and within variation (around the mean) for many samples – AKA partitioning variance

Grand Mean

Each observation is different from from the grand mean (combined overall average of the entire sample) due to the IV and/or random/unexplained error (e.g., individual differences)

Assumptions of ANOVA

Before conducting ANOVA, it is important to compare the variances across groups, as ANOVA will not work if the variances are too different

Sum of Squares

Sum of squares (a measure of variation): SStotal (total variability across the whole sample – numerator of variance) = SSbetween (how far the group mean is from the grand mean – measures differences between groups that can be explained using our IV) + SSwithin (how far each score is from the group mean – unexplained variation)

Mean Square

The sum of squares must be standardized so values can be compared regardless of sample size/differences between groups only reflect variance – this creates the mean square, and involves dividing the SS by the df – the mean square is more useful than the SS b/c it is a pure measure of variation that is not influenced by sample size

F-Test

MSwithin = SSwithin/dfwithin (ntotal k), MSbetween = SSbetween/dfbetween (k – 1) | Fobserved = MSbetween/MSwithin – Since F-values follow a known distribution, we can assess whether differences are significant once we know the F-value – The shape of the F-distribution depends on sample size and the # of groups

Null Hypothesis

H0 = no differences in the DV across the IV (any differences are due to sampling error) – If H0 is true, then all samples were randomly drawn from the same population, the means of these samples should vary from the true population mean due to sampling variability, and their distribution will estimate the true sampling distribution of the mean (MSbetween and MSwithin = 2 different ways of estimating the variance of the sampling distribution under H0) – Under H0, F = 1 b/c the numerator and the denominator will be the same number (in a t-test, H0 = 0 and can be negative b/c the numerator measures the difference between means

Significant F-Value

When Fobserved > Fcritical, at least one group in the sample is significantly different from the population

Key Limitations of ANOVA

The assumption of equal variances across groups within the population (ANOVA becomes increasingly unusable when there are large differences between groups), the determination of what a significant F-value means (Post-hoc tests required to determine which group is different)

Homogeneity of Variance Test

You do not want the homogeneity of variance test to be significant (p-value > .05 is ideal) as it violates assumptions made by ANOVA – if violated, conduct Brown-Forsythe test

Post-Hoc Tests

ANOVA can only test the hypothesis that”at least one of the group means is different from the grand mea” and does not specify which group is different – To determine which group is different: Examine group means (e.g., state what groups are higher, lower, or similar), Run post-hoc tests to find out which groups have means that are significantly different from each other (e.g., Scheffe)

Effect Sizes

Effect sizes for pairwise comparisons

Cohen’s d and 95% CI

Effect size of the variable as a whole

Eta-squared (η2) – Tells you the proportion of variability in the DV that can be explained by the IV – Cohen’s conventions: .01 = small, .06 = moderate, .14 = large

Types of ANOVA

One-way ANOVA = one IV, Factorial ANOVA = 2+ IVs and interactions (when the effect of one IV differs based on the levels of another IV), MANOVA = multivariate/multiple DVs, COVA = analysis of covariance (controlling for covariates [other variables])

Associations

demonstrate whether a relationship exists between two variables and the direction and strength of that relationship | Pearson’s r= the correlation coefficient (primary measure of association for interval-ratio variables) – Measures the amount of change in the DV produced by a unit change in the IV, where units are expressed in standard deviations – Varies between -1 and +1 (0 = no relationship) | Cohen’s interpretation of r: .10 and above = small, .30 and above = moderate, .50 and above = large – Lots of effect sizes available, and what you choose matters| Scatterplots are graphs that display relationships between two continuous variables – The more the dots follow a clearly defined (often linear) pattern, the stronger the relationship (line of best fit shows this pattern) | Positive relationship = when one variables goes up, the other variable also goes up (r = a positive value) – Negative relationship = when one variables goes up, the other variables goes down | The overlap in variance between the variables can tell us about how much of the variance in a variable can be explained w/the use the use of the other variables – The more overlap there is, the stronger the relationship | Coefficient of determination (r2) = the percentage by which errors are reduced when the info found in the IV is included in the prediction of the DV (AKA the proportion of variance explained) | Unexplained variation = the prediction error, which could be due to variations that were not included as predictors, measurement error, or random error | Total variation = explained variation + unexplained variation | Correlation ≠ causation: the fact that an IV “explains variation” in the DV means that we can explain a certain amount of variation in one variable by using another variable – Spurious relationships exist (when variables are associated but not causally related): May cause Type I errors, conduct planned analyses| If making a prediction w/no knowledge, use the base rate | A more precise interpretation of r: r refers to the degree of correspondence between a person’s z-score on the IV (X) and their z-score on the DV (Y) – Relationship between the average person’s score in SD units on one variable and in SD units on a second variable (e.g., if r = 0.5, a person w/ a variable that has a change in the z-score from 0 to 1 on one variable will see a corresponding increase of 0.5 z-units on the second variable |A correlation matrix is used to show correlations between variables and all of the possible relationships between variables on a grid – Each cell represents the correlation between variables in that row and column and shows the bivariate relationships among each set of two variables | Key assumption for correlations: the relationship between the variables is linear (values of X and Y increase or decrease monotonously) – Pearson’s r is not going to pick up non-linear correlations, however, analyses can incorporate non-linearity | Types of correlations: Phi correlation (two nominal variables), Point biserial correlation (one dichotomous variable and one ordinal or continuous variable – don’t use if don’t have an even 50/50 split in your dichotomous variable), Kendall’s tau or Spearman’s rho (two ordinal variables) | Factors to consider when picking a type of correlation: the conventions for your field/type of analysis, what’s commonly done for research in your area, statistics literature re: the robustness of certain statistics in certain situations (how robust the statistic is to violations of a normal distribution)