Statistical Data Analysis: Variables, Distributions, and Measures

Statistics

Statistics: The collection, organization, summary, interpretation, and communication of digital information.

Types of Statistics

Descriptive Statistics: Quantitatively describe a number of people, places, or things.

Inferential Statistics: Draw conclusions about a large group through a small part of the total.

Applications

Health, business, and industry.

Key Concepts

Entity

Focuses attention on a group of people, places, or things.

Variable

Is the feature set of entities of interest in scientific research.

Types of Variables

Random Variable

If the numerical values a variable takes are random and unpredictable, we represent them using capital letters X, Y, and Z.

Continuous Variable

A variable that takes values in a continuous range (e.g., measured uniformly).

Discrete Variable

A variable whose values are separated by a certain amount, with gaps or breaks between possible values.

Quantitative Variable

Variables with numerical values resulting from measurements (e.g., height, weight, temperature).

Qualitative Variable

A variable whose values consist of categories.

Data Analysis

Frequency Distribution

A representation of the numerical categories of a variable along with the number of entities in each category.

Class Intervals: Non-overlapping, contiguous categories identified by upper and lower class limits.

Types of Frequency Distributions

Cumulative Frequency Distribution

Data presented on a cumulative basis, showing the total count up to each category.

Relative Frequency Distribution

Proportion or percentage of values in each class interval, showing the relative frequency of observations.

Histogram

A graphical representation of a frequency distribution or relative frequency distribution.

Features:

  • Possible values on the horizontal axis, frequency on the vertical axis.
  • Each class interval is represented by a bar.
  • Bars have the same width as the corresponding class intervals.
  • The height of a bar corresponds to the frequency of values in that interval.

Frequency Polygon

A graphical representation of a frequency distribution, connecting the midpoints of the tops of histogram bars.

Features:

  • Endpoints are connected to the horizontal axis at the midpoints of adjacent imaginary class intervals.
  • The total area under the curve equals the total area under the histogram.

Shapes of Frequency Polygons

  • Leptokurtic: Concentrated results in the distribution center.
  • Mesokurtic: Ideal normal curve shape.
  • Platykurtic: Opposite of leptokurtic, flatter distribution.
  • Symmetric: Normal curve, halves match when folded.
  • Rectangular: Uniform distribution.
  • Bimodal: Frequencies of two different populations in a single graph.

Sampling

Simple Random Sampling

Method of obtaining data from a sample of a population, used in statistical inference.

Measures of Central Tendency

A single number indicating the center of a series of numbers.

  • Arithmetic Mean: The average of all values.
  • Median: The middle value when data is ordered.
  • Mode: The most recurring value.
  • Range: The difference between the maximum and minimum values.

Measures of Dispersion

  • Variance: The average of the squared differences from the mean.
  • Standard Deviation: The square root of the variance.