Statistical Analysis: ANOVA, Experiments, and Variable Types

Statistical Analysis Definitions and Assumptions

Observational Study: Data is collected without any attempt to manipulate or influence the outcome.

Planned Experiment: Some manipulation is attempted in order to see if the outcome is related to the controlled factor.

Independent Variable: The variable that is manipulated.

Dependent Variable: Variable that is not manipulated but is affected by the independent variable.

Stratification: Units in the sampling frame are first divided into groups.

Simple Treatment: Only one source of variation.

Complex Treatment: Two or more sources of variation provided by the treatments.

Balanced Design: Every treatment is assigned to an equal number of units.

Blocking: Each group of ‘similar units’ should roughly have equal numbers of units for each treatment. Removes unexplained variability.

Covariates: Characteristics/Properties of the experimental units which are not included in the treatment.

Replication Number: Number of random independent copies/units of each treatment.


Fixed Effects Model (Model I ANOVA): Treatments included are the only ones of interest.

Random Effects Model (Model II ANOVA): We assume there is some treatment effect and ask how much of the variability in the response Y is due to the treatments as opposed to residual unexplained variability.


One-Way ANOVA

Assumptions:

  1. Only one factor in the treatments.
  2. Response Variable is continuous.
  3. Units are homogeneous.

Formula: γij = μi + Eij (Errors come independently from a N(0, σ2) Distribution).

γij = Observed Values of the ith group

μi = Population Mean of the ith group

Eij = Deviation from the Mean for the ith group

Population mean and deviation from the Mean are unknown.

Presentation: Description, Model Equation, Assumptions, Hypothesis, ANOVA Table, Conclusions. Interpretation.


Random Effects Model: γij = μ + Ai + Eij

Assumptions: Errors are independent from N(0, σ2) distribution.

Random Effects A come independently from a N(0, σA2) distribution.

Random Effects is independent of the errors.


Assumptions

Normality of Residuals (QQ-Plots) Less important for ANOVA than Constant Variance.

Following the line = Normal Residuals, ~ Following the line = Non-normal Residuals.


Equal Variance (Levene’s Test)

Looking for homogeneity of variances, meaning that there is a difference with at least one variance.

One-Way ANOVA is sensitive to failure if data is unbalanced or a small data set.


Constant Variance (Scatter-Residual Plot)

No Pattern = Constant Variance, Pattern = Non-constant Variance.


Independence

Often impossible to check, we have to look at how data was collected and recorded to find that out.


Normality of Dependent Variable (Box Plot)

Check for overlapping, spread, mix and max. Sites with higher median DBH also have more spread suggesting that a log transformation may be needed.


Pair-wise Comparison of Means (Tukey Test)

Looks at P-Values of each pairing. Should include above and below 0. Check to see if any difference.


Non-Parametric Test (Kruskal-Wallis)

Showcases if there is a difference in the medians across all groups.


If assumptions not met:

  • Work with transformed data, e.g., Log.
  • Switch to non-parametric test, e.g., Kruskal-Wallis (RStudio).