Key Statistical Concepts and Data Visualization

Posted on Jan 6, 2025 in Statistics

Key Statistical Definitions

Independence: The random choice of each individual in the sample is not influenced by which other individuals are chosen.
Sample of Convenience: Samples chosen because they are easily available.
Haphazard Sampling: Samples you hope you chose randomly.
Volunteer Bias: Choosing individuals that are more easily available than others.
Accuracy: How close the average estimate from many studies is to the parameter.
Precision: How spread out repeated estimates are from their average.
Ordinal Categorical Data: Non-numerical categories that have an order.
Nominal Categorical Data: Categories that do not have an order.
Continuous Numerical Data: Can be any value within a parameter.
Discrete Numerical Data: Can only be specific values.

Distributions

Frequency Distribution: Describes the number of times each value of a variable occurs in a sample. If we hypothetically repeat the study, we get a different sample frequency each time.
Probability Distribution: Shows the frequencies of any (range of) values from the population; this does not change with each sample. It is a list of the probabilities of all mutually exclusive outcomes of a random trial.

Study Types

Observational Studies: Studies that cannot answer questions about causation. The assignment of treatments is not made by the researcher.
Experimental Studies: Studies that allow attribution of statistical relationships to causation by the experimentally manipulated variables. The researcher assigns treatments randomly to individuals.

Variables

Explanatory Variable: A variable that predicts or affects the other variable in a study (independent variable).
Response Variable: The variable of focus in a study or experiment (dependent variable).

Effective Data Visualization

Characteristics of a Good Graph

Shows the data clearly.
Makes patterns in the data easy to see.
Represents magnitudes honestly.
Draws graphical elements clearly.

Graph Types

Relative Frequency Distribution: Describes the fraction of occurrences of each value of a variable (showing data for one variable).
Histogram: Uses the area of rectangular bars to display the (relative) frequency distribution of a numerical variable. Can present discrete or continuous data. Mode is the peak. Bimodal indicates two distinct peaks.
Mode: The interval corresponding to the highest peak in the frequency distribution.
Skew: Refers to asymmetry in the shape of a frequency distribution for a numerical value.
- Positive Skew: Mode to the right, long tail to the left.
- Negative Skew: Mode to the left, long tail to the right.
Contingency Table: Gives the relative frequency (i.e., proportion) of occurrences of all combinations of two (or more) categorical variables.
Grouped Bar Graph: Uses the height of rectangular bars to display the (relative) frequency distributions of two or more categorical variables.
Mosaic Plot: Uses the area of rectangles to display the relative frequency of occurrence of all combinations of two categorical variables.
Scatter Plot: A graphical display of two numerical variables in which each observation is represented as a point on a graph with two axes.
Strip Chart: A graphical display of a numerical variable and a categorical variable in which each observation is represented as a dot. Shows all data points.
Box Plot: Uses lines and a rectangular box to display the median, quartiles, range, and extreme measurements of the data.
Line Graph: Several points linked by straight lines.
Violin Plot: A hybrid of a box plot and a kernel density plot, which shows peaks in the data.

Statistical Measures

Standard Deviation (SD): Used to measure the spread of a distribution from the mean. Large if most observations are far from the mean; small if most measurements lie close to the mean. It is the square root of the variance.
Coefficient of Variation (CV): Calculates the standard deviation as a percentage of the mean. CV = (Standard Deviation / Mean) * 100%. A lower CV means individuals are more consistently the same. A higher CV means that there is more variability. Used to compare the variability of traits that do not have the same units.

Key Statistical Concepts and Data Visualization

Key Statistical Definitions

Distributions

Study Types

Variables

Effective Data Visualization

Characteristics of a Good Graph

Graph Types

Statistical Measures

Recent Notes

Subjects

Publicidad