Statistics and Statistical Inference Key Concepts

Statistics and Statistical Inference

Basic Concepts

Population: The complete set of data for a statistical study.

Sample: A subset of the population. The methods used to select samples are called sampling.

Random Sample: Elements chosen with equal opportunity for selection.

Non-Random Sample: Elements selected based on specific criteria defined by the researcher.

Table of Random Numbers: A set of digits generated by a computer.

Branches of Statistics

  • Descriptive Statistics: Presents, represents, and summarizes data, including calculating descriptive measures.
  • Inferential Statistics: Analyzes data and uses probability to make predictions or decisions about the future.

Variables

Variable: A characteristic of a population or sample that can take on one or more values.

Quantitative Variable: Takes numeric values.

Qualitative Variable: Takes non-numeric values (qualities).

Ordered Variable: Values have a natural order.

Discrete Variable: Takes integer values.

Continuous Variable: Takes non-integer values.

Data Analysis

Rounding: Approximating a number to the nearest specified denomination.

Real Limits of a Range: The true upper and lower bounds of a range. Calculated by adding/subtracting half the smallest unit of measure to/from the stated limits. Example: The interval 2.42-2.45 has real limits from 2.415 to 2.455.

Midpoint of a Range: The middle value of an interval, calculated by adding the endpoints and dividing by two. Represents all values within the range.

Data Processing: Sorting, tabulating, and coding data (manually or electronically) obtained through various methods, such as interviews.

Range: The difference between the highest and lowest values in a dataset.

Frequency Distribution: A table summarizing data, often organized into intervals, with their corresponding frequencies.

Absolute Frequency (fo or n): The number of times a data point or interval occurs.

Relative Frequency (fr): The ratio of the absolute frequency to the total number of data points (N): fr = f/n

Percent Frequency (fp): The relative frequency multiplied by 100: fp = (f/n) * 100%

Cumulative Frequency (fa): The sum of the frequencies up to and including a given data point or interval.

Width/Length of an Interval: The difference between the real limits of a range.

Midpoint/Class Mark: The average of the real limits of a range. For interval [a, b], the midpoint is (a + b)/2.

Graphs and Charts

Rectangular Coordinate System: Two perpendicular axes (X and Y) used to locate points in a plane.

Coordinates of a Point: An ordered pair (x, y) representing a point’s location, where x is the abscissa (horizontal position) and y is the ordinate (vertical position).

Symbol Chart/Pictogram: Uses pictures to represent data.

Pie Chart: Represents a frequency distribution using a circle, with central angles corresponding to class percent frequencies.

Bar Graph: Uses bars of varying heights to represent the frequencies of different classes.

Histogram: Represents class intervals with rectangles. The base of each rectangle is the interval width, and the height is the frequency.

Frequency Polygon: A line graph connecting points with coordinates (class midpoint, frequency).

Describing a Graph: Analyzing a graph’s shape and characteristics, such as normality, asymmetry, or multimodality.

Normal Curve: A bell-shaped, symmetrical curve.

Asymmetric Curve: A skewed (non-symmetrical) bell curve. Positive skew has a tail to the right; negative skew has a tail to the left.

Multimodal Curve: A curve with multiple peaks.

Ogive: A line graph showing cumulative frequencies. Can be increasing (cumulative frequency plotted against upper class limits) or decreasing (cumulative frequency plotted against lower class limits).

Measures of Central Tendency

Measures of Central Tendency: A single value representing the center of a dataset. Common measures include mean, median, and mode.

Mean: The sum of all values divided by the total number of data points.

Mode: The value with the highest frequency.

Median: The middle value when data is arranged in ascending order.

Quartile: Divides the data into four equal parts (25% each). Q1 represents the 25th percentile, Q2 the 50th percentile (median), and Q3 the 75th percentile.

Decile: Divides the data into ten equal parts (10% each). D1 represents the 10th percentile, D2 the 20th percentile, and so on.

Percentile: Divides the data into 100 equal parts (1% each). P1 represents the 1st percentile, P2 the 2nd percentile, and so on.

Measures of Dispersion

Measures of Dispersion: Indicate the spread or variability of data. Common measures include range, variance, and standard deviation.

Range/Amplitude: The difference between the highest and lowest values.

Variance: The average of the squared differences between each data point and the mean.

Standard Deviation: The square root of the variance.

Coefficient of Variation: The ratio of the standard deviation to the mean.