Statistics Fundamentals: Sampling, Probability, and Hypothesis Testing

Posted on Jun 28, 2024 in Mathematics

Populations, Samples, and Parameters

A population encompasses all items (data) relevant to a study, while a sample is a subset drawn from this population. A parameter describes a population characteristic, whereas a statistic describes a sample characteristic.

Conducting a census (surveying the entire population) can be expensive and time-consuming. Therefore, sampling methods are employed.

Types of Sampling Plans

Simple Random Sampling: Every possible sample of the same size has an equal chance of selection.
Stratified Random Sampling: The population is divided into mutually exclusive groups (strata) based on characteristics like age or gender, followed by simple random sampling within each stratum.
Cluster Sampling: Involves randomly selecting groups (clusters) of elements.

Sampling and Non-Sampling Errors

Sampling errors arise from differences between the sample and the population due to the specific observations selected. Increasing the sample size can reduce sampling errors.

Non-sampling errors stem from data acquisition mistakes, non-response bias, and selection bias.

Data Types and Series

Data Types

Numerical: Real numbers allowing calculations (e.g., height).
Nominal: Categories assigned numerical codes without inherent order (e.g., single = 1, married = 2).
Ordinal: Categories with a meaningful order or ranking (e.g., school grades).

Data Series Types

Cross-sectional: Variables measured at a single point in time across different subjects (e.g., 2016 Australian Census).
Time series: Variables measured at regular intervals over time (e.g., stock market prices).
Longitudinal: Variables measured on the same subjects over multiple time points (e.g., Household Income and Labour Survey).

Observational vs. Experimental Studies

An observational study observes and measures variables without manipulating any factors. In contrast, an experimental study manipulates factors to investigate their effects on the variables of interest.

Histograms and Skewness

A histogram is skewed if it has a longer “tail” on one side. A positively skewed histogram has a tail extending to the right (mean > median > mode). A negatively skewed histogram has a tail extending to the left (mean < median < mode).

wOymrysBbGFGwAAAABJRU5ErkJggg== wNvxW5pIKQCtQAAAABJRU5ErkJggg== A6w4l74Knr43AAAAAElFTkSuQmCC g9Ts3vkZWoWsAAAAABJRU5ErkJggg== CZmOhAAAAAElFTkSuQmCC

Random Experiments and Probability

In a random experiment, the outcome is uncertain (e.g., flipping a coin). The sample space lists all possible outcomes. Simple events are mutually exclusive, and the list of simple events must be exhaustive.

Probability Concepts

Intersection (Joint Probability): The probability of both events A and B occurring, denoted as P(A∩B).
Mutually Exclusive Events: Events with no common outcomes.
Union: The probability of event A or B or both occurring, denoted as P(A∪B).
Complement: The probability of an event not occurring, denoted as P(A’).
Difference: The probability of event A occurring but not event B, denoted as P(A-B) or P(A/B).
Independent Events: Events whose probabilities do not influence each other.

pXQyoYbwAAAABJRU5ErkJggg== D8Sj11+3eykiAAAAAElFTkSuQmCC

Probability Distribution Table

A probability distribution table summarizes the probabilities of different outcomes in a random experiment. For example:

Although the probability distribution table can only provide values for P(Z 1.80) = 1 – P(Z

AAAAABJRU5ErkJggg== 2O2PwBvpPgUAAAAASUVORK5CYII=

Bernoulli Trial

A Bernoulli Trial is a random experiment with these characteristics:

Two possible outcomes: success or failure.
Constant probability of success on each trial.
Independent trials.

Point and Interval Estimation

A point estimator uses a single value from sample data to estimate an unknown population parameter (e.g., mean, median). A good estimator is unbiased, consistent (converges to the true parameter as sample size increases), and efficient (smaller variance among unbiased estimators).

An interval estimator estimates an unknown parameter using an interval, such as a confidence interval. For instance, finding Z1 and Z2 to define a 95% confidence interval means that there’s a 95% probability that the true population parameter falls within that interval.

AAAAAElFTkSuQmCC LwGu1urQAAAAASUVORK5CYII= AHVjyjDjJRZLAAAAAElFTkSuQmCC

Hypothesis Testing

Hypothesis testing involves formulating two hypotheses:

Null Hypothesis (H0): States that the parameter equals a specific value.
Alternative Hypothesis (HA): Considers inequalities, suggesting the parameter is different from the value stated in the null hypothesis.

alIAAAAASUVORK5CYII= A2QAAmQAAkEQoCCEghGZkICJEACJEBB4W+ABEiABEggEAIUlEAwMhMSIAESIAEKCn8DJEACJEACgRCgoASCkZmQAAmQAAlQUPgbIAESIAESCIQABSUQjMyEBEiABEiAgsLfAAmQAAmQQCAEKCiBYGQmJEACJEACFBT+BkiABEiABAIhQEEJBCMzIQESIAESoKDwN0ACJEACJBAIgf8D9RaPBMxkSKcAAAAASUVORK5CYII=

+DPSIQECBAgQIECAAAECBIoEBL8iPsUECBAgQIAAAQIECBCILyD4xZ+RDgkQIECAAAECBAgQIFAkIPgV8SkmQIAAAQIECBAgQIBAfIHvzP+H6xXyB2kAAAAASUVORK5CYII=