Statistics Fundamentals: Sampling, Probability, and Hypothesis Testing
Populations, Samples, and Parameters
A population encompasses all items (data) relevant to a study, while a sample is a subset drawn from this population. A parameter describes a population characteristic, whereas a statistic describes a sample characteristic.
Conducting a census (surveying the entire population) can be expensive and time-consuming. Therefore, sampling methods are employed.
Types of Sampling Plans
- Simple Random Sampling: Every possible sample of the same size has an equal chance of selection.
- Stratified Random Sampling: The population is divided into mutually exclusive groups (strata) based on characteristics like age or gender, followed by simple random sampling within each stratum.
- Cluster Sampling: Involves randomly selecting groups (clusters) of elements.
Sampling and Non-Sampling Errors
Sampling errors arise from differences between the sample and the population due to the specific observations selected. Increasing the sample size can reduce sampling errors.
Non-sampling errors stem from data acquisition mistakes, non-response bias, and selection bias.
Data Types and Series
Data Types
- Numerical: Real numbers allowing calculations (e.g., height).
- Nominal: Categories assigned numerical codes without inherent order (e.g., single = 1, married = 2).
- Ordinal: Categories with a meaningful order or ranking (e.g., school grades).
Data Series Types
- Cross-sectional: Variables measured at a single point in time across different subjects (e.g., 2016 Australian Census).
- Time series: Variables measured at regular intervals over time (e.g., stock market prices).
- Longitudinal: Variables measured on the same subjects over multiple time points (e.g., Household Income and Labour Survey).
Observational vs. Experimental Studies
An observational study observes and measures variables without manipulating any factors. In contrast, an experimental study manipulates factors to investigate their effects on the variables of interest.
Histograms and Skewness
A histogram is skewed if it has a longer “tail” on one side. A positively skewed histogram has a tail extending to the right (mean > median > mode). A negatively skewed histogram has a tail extending to the left (mean < median < mode).
Random Experiments and Probability
In a random experiment, the outcome is uncertain (e.g., flipping a coin). The sample space lists all possible outcomes. Simple events are mutually exclusive, and the list of simple events must be exhaustive.
Probability Concepts
- Intersection (Joint Probability): The probability of both events A and B occurring, denoted as P(A∩B).
- Mutually Exclusive Events: Events with no common outcomes.
- Union: The probability of event A or B or both occurring, denoted as P(A∪B).
- Complement: The probability of an event not occurring, denoted as P(A’).
- Difference: The probability of event A occurring but not event B, denoted as P(A-B) or P(A/B).
- Independent Events: Events whose probabilities do not influence each other.
Probability Distribution Table
A probability distribution table summarizes the probabilities of different outcomes in a random experiment. For example:
Although the probability distribution table can only provide values for P(Z 1.80) = 1 – P(Z
Bernoulli Trial
A Bernoulli Trial is a random experiment with these characteristics:
- Two possible outcomes: success or failure.
- Constant probability of success on each trial.
- Independent trials.
Point and Interval Estimation
A point estimator uses a single value from sample data to estimate an unknown population parameter (e.g., mean, median). A good estimator is unbiased, consistent (converges to the true parameter as sample size increases), and efficient (smaller variance among unbiased estimators).
An interval estimator estimates an unknown parameter using an interval, such as a confidence interval. For instance, finding Z1 and Z2 to define a 95% confidence interval means that there’s a 95% probability that the true population parameter falls within that interval.
Hypothesis Testing
Hypothesis testing involves formulating two hypotheses:
- Null Hypothesis (H0): States that the parameter equals a specific value.
- Alternative Hypothesis (HA): Considers inequalities, suggesting the parameter is different from the value stated in the null hypothesis.