Introduction to Statistical Concepts and Methods
Statistics
Statistics is commonly regarded as a collection of numerical facts expressed in terms of a submissive relationship, derived from other numerical data.
Kendall and Buckland define statistics as a summary value calculated from a sample of observations, often used to estimate a population parameter.
Gini (1953) states: “Statistics is a specialized technique suitable for the quantitative study of mass or collective phenomena, requiring a mass of observations of simpler individual phenomena.”
Spiegel (1991) defines statistics as: “The scientific study of methods for collecting, organizing, summarizing, and analyzing data, and drawing valid conclusions and making reasonable decisions based on this analysis.”
Yale and Kendal (1954) describe statistics as: “The science that deals with the collection, classification, and presentation of numerically assessable facts, based on the explanation, description, and comparison of phenomena.”
Regardless of the viewpoint, the scientific importance of statistics is undeniable due to its wide range of applications.
Population
In statistics, a population is a finite or infinite set of individuals or objects sharing common characteristics.
Levin & Rubin (1996) define a population as: “A set of all elements under consideration, about which we try to draw conclusions.”
Chains (1974) defines it as: “A set of elements that have one common feature.”
Example: Members of the College of Engineers of Cojedes State.
Population size, determined by the number of elements, is crucial in statistical research. A large population can be considered infinite (e.g., all positive numbers). A finite population has a limited number of elements (e.g., students at Simón Rodríguez National Experimental University, San Carlos Campus).
Sample
Spiegel (1991) describes a sample as: “A part of the study population used to represent it.”
Levin & Rubin (1996) define a sample as: “A collection of some, but not all, elements of the population.”
Chains (1974) notes: “A sample should be defined based on the population, and conclusions drawn from it only refer to the reference population.”
Example: Studying 50 members of the College of Engineering of Cojedes State.
Sampling is simpler, cheaper, and faster than studying an entire population. It can also improve quality control by allowing for the rejection of defective items.
A representative sample contains the population’s salient features in the same proportions.
Statisticians use sample data to make inferences about the represented population. Thus, sample and population are relative concepts: a population is the whole, and a sample is a fraction or segment.
Variables and Attributes
Variables, also called quantitative traits, are expressible numerically and measurable (e.g., height, weight, income, age).
Spiegel (1992) defines a variable as: “A symbol, such as X, Y, Hx, that can take any value from a particular set, called the variable’s domain. If the variable can only take one value, it’s called a constant.”
While all population elements share the same character types, their intensity varies. These variations are the “values of the variable,” which collectively constitute a variable.
Data Collection Methods
Statistics employs various methods to obtain information. Here are some key methods, with their advantages and limitations:
Personal Interview
Data is often collected by sending an interviewer directly to the subject. The interviewer asks pre-written questions from a questionnaire or ballot and records the responses. This method allows for more accurate and comprehensive data due to direct interaction, enabling clarification of questions.
It also allows interviewers to adapt the language to the respondent’s intellectual level.
However, interviewer bias or lack of training can alter responses.
Importance of Statistics
Statistics is essential to almost every human activity. Major life decisions rely on statistical applications. Examples include:
- Population censuses
- Food basket studies
- Inflation determination
- Wage increases
- Accident frequency analysis
- Disease frequency analysis
- Life insurance payments
- Transportation fares
- Hotel and taxi rates
- Infant mortality causes
- Candidate preferences in advertising
- School needs assessments
- Product sales analysis
Statistics in Decision Making
Effective decision-making is crucial for company leadership and higher-paying positions. Good decision-makers can act rationally even with incomplete information.
Statistics helps fill information gaps, enabling rational and sound decisions. It provides deeper, rationally acceptable knowledge of process behavior, allowing us to “see beyond our eyes.”
Ungrouped Data
Ungrouped data is a collection of information in no particular order, providing a clear relationship to a problem. It’s organized using a tally, leading to a frequency table.
Treatment for Ungrouped Data
When a sample from a population or process has less than 20 elements, the data is analyzed without forming classes. This is called ungrouped data processing.
Measures of Central Tendency
Measures of central tendency (arithmetic mean, median, geometric mean, mode, etc.) tend to be located in the center of a data distribution.
Grouped Data
Dispersion Measures
Dispersion measures describe the spread or extension of data distribution, indicating how observations are distributed relative to a central value.
Measures of Central Tendency
These measures describe the typical characteristics of datasets. Several types of averages are used to represent the central tendency, where data tends to accumulate around intermediate values.
Commonly used measures include:
- Arithmetic mean
- Median
- Mode
- Geometric mean
- Harmonic mean
- Quantiles