Statistics Cheat Sheet: Key Concepts and Formulas

Posted on May 7, 2024 in Mathematics

Tema 2: Data Analysis and Descriptive Statistics

Types of Variables

Variables can be categorized as either categorical (non-numerical values, e.g., hair color) or numerical. Numerical variables can be further classified as discrete (integer values, e.g., goals scored) or continuous (decimal values, e.g., height or weight).

Data Classification

Qualitative Data:
- Nominal: Categories with no inherent order (e.g., hair color).
- Ordinal: Ordered categories (e.g., education level).
Quantitative Data:
- Interval: Numerical data where the zero point is arbitrary (e.g., temperature in Celsius).
- Ratio: Numerical data with a meaningful zero point (e.g., weight).

Frequency Distributions and Graphs

Frequency Distribution Tables: Summarize the frequency of each value or category of a variable, including absolute frequency (ni), relative frequency (fi), cumulative absolute frequency (Ni), and cumulative relative frequency (Fi).
Graphs for Categorical Variables:
- Bar Chart: Represents categories with bars.
- Pie Chart: Displays categories as slices of a circle.
Graphs for Continuous Variables:
- Histogram: Shows the distribution of continuous data using bars.
- Line Chart: Plots data points connected by lines over time or another continuous variable.
Graphs for Two or More Variables:
- Scatter Plot: Displays the relationship between two numerical variables.
- Contingency Table: Analyzes the association between categorical variables.

Measures of Central Tendency

Mean (X): The average of all values in a dataset.
Median (Me): The middle value when data is ordered from least to greatest.
Mode (Mo): The value that occurs most frequently.

Measures of Dispersion

Range (Rg): The difference between the highest and lowest values.
Variance (S^2x): Measures the spread of data around the mean.
Standard Deviation (Sx): The square root of the variance.
Coefficient of Variation: A relative measure of dispersion, expressed as a percentage of the mean.

Measures of Association

Covariance (Sxy): Measures the linear relationship between two variables.
Correlation Coefficient (r): A standardized measure of the linear relationship between two variables, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation).

Tema 3: Probability

Basic Concepts

Sample Space (E): The set of all possible outcomes of a random experiment.
Events: Subsets of the sample space.
- Incompatible Events: Events that cannot occur simultaneously.
- Exhaustive Events: Events that cover all possible outcomes.
- Basic/Complementary Events: Events that are both incompatible and exhaustive.

Approaches to Measuring Probability

Classical Approach (Laplace): P(s) = Favorable Events / Possible Events
Frequentist Approach: P(s) = lim nà∞ ns/N (Limit of the relative frequency of an event as the number of trials increases)
Subjective Approach: Based on personal belief or judgment.

Probability Rules and Formulas

0≤P(s)≤1 P(O)=0 P(E)=1 P(A)=1-P(a) P(a)+P(A)=1 P(aub)=P(a)+P(b)-P(anb) P(a)=P(anb)+P(anB) P(a/b)=P(anb)/P(b) P(A∩B)=1-P(anb) P(AnB)=1-P(aub) P(a/B)=P(anB)/P(A) Teorema BayesàP(H/E)=P(E/H)*P(H)/P(E)

Tema 4: Random Variables and Probability Distributions

Random Variables

Discrete Random Variable: Takes on a finite or countable number of values.
Continuous Random Variable: Can take on any value within a given range.

Probability Function

P(X=x) gives the probability that a discrete random variable X takes on the value x.

Properties:

0≤P(X=x)≤1
Sumx P(X=x)=1

Cumulative Probability Function

F(xo)=P(X≤x) gives the probability that a random variable X is less than or equal to a certain value xo.

Properties:

0<=F(xo)≤1
if B>A then F(B)≥F(A)
P(A

Expected Value

E(x)=Mux=Sumx x*P(X=x) represents the average value of a random variable.

Properties:

if X=k then E(k)=k (constant)
E(a+bX)=a+b*E(X) with a and b constants
For two random variables X & Y à E(X+Y)=E(X)+E(Y)
For two independent variables X & Yà E(X*Y)=E(X)*E(Y)

Variance and Standard Deviation

V(x)=S^2x=Sumx (x-Mux)2*P(X=x)=E(x-Mux)^2 measures the spread of a random variable around its mean.

Properties:

if X=k then V(k)=0 (Constant)
V(a+bX)=b2*V(X) with constant a and b
For two independent variables X&YàV(X+-Y)=V(X)+V(Y)
For two random variables X&YàV(X+-Y)=V(X)+V(Y)+-2cov(X,Y)

Discrete Random Variable Models

Binomial (X—Bin(n,p)): Models the number of successes in n independent trials, each with probability p of success.
- P(X=x)=(nx)*p^x (1-p)^n-x
- E(X)=Mux=n*p
- V(X)=S^2x=n*p*(1-p)
Poisson (L=n*p): Models the number of events occurring in a fixed interval of time or space, given an average rate of occurrence L.
- P(X=x)=e^-L *L^x /X!
- E(X)=Mux=L
- V(X)=S^2x=L
Bernoulli Trial (X—Bin(1,p)): A special case of the binomial distribution with only one trial.
- P(X=x)=p^x (1-p)^1-x
- E(X)=p
- V(X)=p(1-p)

*When n is very large and p is very small, the Poisson distribution can be used to approximate the binomial distribution.

Tema 5: Statistical Inference

Classic Statistics

Descriptive Statistics: Summarizes and describes data using graphical and numerical methods.
Inferential Statistics: Uses sample data to make inferences about a larger population, with a degree of uncertainty or error.

Estimation and Hypothesis Testing

Estimation: Involves estimating population parameters (e.g., mean weight of Spanish women) using sample data.
Hypothesis Testing: Tests claims about population parameters (e.g., testing if the population mean weight is 60kg).

*Inference is the process of drawing conclusions about a population based on sample results.

Population vs. Sample

Population (N): The entire set of individuals or items of interest.
Population Variables (E,n,o): Random features of interest in the population.
Parameter: A numerical characteristic of a population (e.g., population mean (Mu), population variance (S^2)).
Sample: A subset of the population used to collect data.
Statistic: A numerical characteristic of a sample (e.g., sample mean (X), sample variance (S^2x)).

Properties of Simple Random Sampling

The mean of the distribution of the sample mean is the population mean à MuX=Mu
The standard deviation of the distribution of the sample mean decreases when the sample size n increasesà SX=S/Raiz n

Central Limit Theorem

If the population is normally distributed or the sample size is large enough (nà∞), the sampling distribution of the sample mean (X) will be approximately normal, with mean Mu and standard deviation S/Raiz n.

Sampling Distribution of p^

For dichotomous data (0,1), the sampling distribution of the sample proportion (p^) will be approximately normal, with mean p and standard deviation Raiz p(1-p)/n, under certain conditions.

Tema 6: Estimation

Estimators and Estimates

Estimator (Ô): A random variable that depends on sample information and provides an approximation to an unknown population parameter.
Estimate (Ôo): A specific value of the estimator obtained from an observed sample.

Confidence Interval

A range of values that is likely to contain the true population parameter with a certain level of confidence (Y%).

Intervalo de Confianza para Y%(Y=Nivel de Confianza)àMu ¢(X-Za/2 *S/Raiz n, X+Za/2 *S/Raiz n)

Confidence Factor (Za/2)

The value from the standard normal distribution that leaves a probability of a/2 to its right, where y=1-a.

Margin of Error (ME)

ME=Za/2 *S/Raiz n

Common confidence levels and corresponding Z-scores:

(-1,1) à68%
(-2,2) à95%
(-3,3) à99%

Statistics Cheat Sheet: Key Concepts and Formulas

Tema 2: Data Analysis and Descriptive Statistics

Types of Variables

Data Classification

Frequency Distributions and Graphs

Measures of Central Tendency

Measures of Dispersion

Measures of Association

Tema 3: Probability

Basic Concepts

Approaches to Measuring Probability

Probability Rules and Formulas

Tema 4: Random Variables and Probability Distributions

Random Variables

Probability Function

Properties:

Cumulative Probability Function

Properties:

Expected Value

Properties:

Variance and Standard Deviation

Properties:

Discrete Random Variable Models

Tema 5: Statistical Inference

Classic Statistics

Estimation and Hypothesis Testing

Population vs. Sample

Properties of Simple Random Sampling

Central Limit Theorem

Sampling Distribution of p^

Tema 6: Estimation

Estimators and Estimates

Confidence Interval

Confidence Factor (Za/2)

Margin of Error (ME)

Recent Notes

Subjects

Publicidad