Key Concepts in Statistics: Data, Methods & Limitations

Limitations of Statistics

  • Statistics does not deal with individuals:
    • Statistics deals with an aggregate of facts and does not give specific recognition to individual items in a series.
    • It deals with groups of individuals and indicates the characteristics of the whole group.
  • Statistics does not deal directly with qualitative characteristics:
    • Statistical methods can only be applied to numerically expressed data.
    • A qualitative phenomenon must be converted into quantitative information before statistical techniques can be used.
  • Statistical laws are not exact:
    • Statistical laws and rules do not hold true in every single case.
    • However, they are true in a majority of cases.
    • Generally, statistical laws are probabilistic in nature.
  • Statistics can be misused:
    • Only experts or statisticians can handle statistical data properly.
    • Statistics are likely to be misused by non-statistical persons when handling data and interpreting results.

Scopes of Statistics

  • Statistics, Computer, and Information Technology
  • Statistics and Accounting
  • Statistics and Economics
  • Statistics and Business
  • Statistics and Planning
  • Statistics and Mathematics
  • Statistics and Medical Science
  • Statistics and Psychology/Education

Data in Statistics

Primary Data

Data originally collected by an investigator or researcher for the first time for the purpose of a statistical inquiry is called primary data.

  • It is collected by governments, individuals, institutions, and research bodies.
  • It requires significant funding, time, and manpower.
  • It is generally more reliable and suitable for the specific purpose.

Methods for Collecting Primary Data

Different methods used for collecting primary data include:

  1. Direct personal interview
  2. Indirect oral interview
  3. Mailed questionnaire
  4. Information through correspondents
  5. Schedule sent through enumerator

Secondary Data

Data that has already been collected for a particular purpose and is then used for another purpose is called secondary data.

  • It is not new or original data.
  • These types of data are generally published in newspapers, magazines, bulletins, reports, journals, websites, radio broadcasts, etc.

Sources of Secondary Data

In order to collect secondary data, the following sources may be used:

  • Published sources:
    • Reports and publications of ministries and departments of the government.
    • Reports and publications of reputed INGOs such as UNDP, ADB, UNESCO, WHO, World Bank, etc.
    • Reports and publications of reliable NGOs, journals, periodicals, etc.
  • (Note: Unpublised sources might also exist but were not listed in the original text)

Random Variable

A variable that takes on different numerical values based on the outcomes of a random experiment is called a random variable (r.v.).

  • Random variables are typically denoted by capital letters (e.g., X, Y, Z).
  • The specific values taken by random variables are denoted by corresponding small letters (e.g., x, y, z).

Differences Between Correlation and Regression

  1. Correlation: Measures the exact numerical strength and direction of the linear relationship between variables.
    Regression: Describes the average relationship between variables, allowing prediction.
  2. Correlation: The correlation coefficient (r) is symmetric, meaning rXY = rYX.
    Regression: Regression coefficients (b) are not symmetric, meaning bXY ≠ bYX (generally).
  3. Correlation: The correlation coefficient always lies between -1 and +1, inclusive (-1 ≤ r ≤ +1).
    Regression: No such strict [-1, +1] limit exists for individual regression coefficients. However, if |bXY| > 1, then |bYX| < 1, and r² = bXY * bYX.
  4. Correlation: The correlation coefficient is independent of changes in origin and scale.
    Regression: Regression coefficients are independent of changes in origin but not independent of changes in scale.

Variables in Regression

Dependent Variable

The variable whose value is to be estimated or predicted is called the dependent variable.

  • It is also known as the regressed, explained, outcome, or response variable.

Independent Variable

The variable used for prediction, or the variable whose value is given, is called the independent variable.

  • It is also known as the regressor, predictor, or explanatory variable.