Statistical Data Analysis: Variables, Distributions, and Measures
Statistics
Statistics: The collection, organization, summary, interpretation, and communication of digital information.
Types of Statistics
Descriptive Statistics: Quantitatively describe a number of people, places, or things.
Inferential Statistics: Draw conclusions about a large group through a small part of the total.
Applications
Health, business, and industry.
Key Concepts
Entity
Focuses attention on a group of people, places, or things.
Variable
Is the feature set of entities of interest in scientific research.
Types of Variables
Random Variable
If the numerical values a variable takes are random and unpredictable, we represent them using capital letters X, Y, and Z.
Continuous Variable
A variable that takes values in a continuous range (e.g., measured uniformly).
Discrete Variable
A variable whose values are separated by a certain amount, with gaps or breaks between possible values.
Quantitative Variable
Variables with numerical values resulting from measurements (e.g., height, weight, temperature).
Qualitative Variable
A variable whose values consist of categories.
Data Analysis
Frequency Distribution
A representation of the numerical categories of a variable along with the number of entities in each category.
Class Intervals: Non-overlapping, contiguous categories identified by upper and lower class limits.
Types of Frequency Distributions
Cumulative Frequency Distribution
Data presented on a cumulative basis, showing the total count up to each category.
Relative Frequency Distribution
Proportion or percentage of values in each class interval, showing the relative frequency of observations.
Histogram
A graphical representation of a frequency distribution or relative frequency distribution.
Features:
- Possible values on the horizontal axis, frequency on the vertical axis.
- Each class interval is represented by a bar.
- Bars have the same width as the corresponding class intervals.
- The height of a bar corresponds to the frequency of values in that interval.
Frequency Polygon
A graphical representation of a frequency distribution, connecting the midpoints of the tops of histogram bars.
Features:
- Endpoints are connected to the horizontal axis at the midpoints of adjacent imaginary class intervals.
- The total area under the curve equals the total area under the histogram.
Shapes of Frequency Polygons
- Leptokurtic: Concentrated results in the distribution center.
- Mesokurtic: Ideal normal curve shape.
- Platykurtic: Opposite of leptokurtic, flatter distribution.
- Symmetric: Normal curve, halves match when folded.
- Rectangular: Uniform distribution.
- Bimodal: Frequencies of two different populations in a single graph.
Sampling
Simple Random Sampling
Method of obtaining data from a sample of a population, used in statistical inference.
Measures of Central Tendency
A single number indicating the center of a series of numbers.
- Arithmetic Mean: The average of all values.
- Median: The middle value when data is ordered.
- Mode: The most recurring value.
- Range: The difference between the maximum and minimum values.
Measures of Dispersion
- Variance: The average of the squared differences from the mean.
- Standard Deviation: The square root of the variance.