Understanding Probability, Distributions, and Statistical Analysis

Understanding Probability

The probability of a given event may be defined as the numerical value given to the likelihood of the occurrence of that event. It is a number lying between ‘0’ and ‘1’. ‘0’ denotes the event which cannot occur, and ‘1’ denotes the event which is certain to occur. For example, when we toss a coin, we can enumerate all the possible outcomes (head and tail), but we cannot say which one will happen.

Permutations

Permutation means arrangement of objects in a definite order. The number of arrangements (permutations) depends upon the total number of objects and the number of objects taken at a time: nPr=n!/(n-r)!

Probability Theorems

There are two important theorems of probability:

  1. Addition Theorem
  2. Multiplication Theorem

Addition Theorem

If two events, ‘A’ and ‘B’, are mutually exclusive, the probability of the occurrence of either ‘A’ or ‘B’ is the sum of the individual probability of A and B.

P(A or B) = P(A) + P(B)

i.e., P(A∪B) = P(A) + P(B)

Multiplication Theorem (Independent Events)

If two events are independent, then the probability of occurring both will be the product of the individual probability.

P(A and B) = P(A).P(B)

i.e., P(A∩B) = P(A).P(B)

Bayes’ Theorem

Bayes’ Theorem is based on the proposition that probabilities should be revised on the basis of all the available information. The revision of probabilities based on available information will help to reduce the risk involved in decision-making. The probabilities before revision are called a priori probabilities, and the probabilities after revision are called posterior probabilities.

Bernoulli and Binomial Distributions

Bernoulli distribution. Binomial distribution is the probability distribution expressing the probability of one set of dichotomous alternatives, i.e., success or failure. In other words, it is used to determine the probability of success in experiments on which there are only two mutually exclusive outcomes. Binomial distribution is a discrete probability distribution.

Poisson Distribution

Poisson Distribution is a limiting form of Binomial Distribution. In Binomial Distribution, the total number of trials are known previously. But in certain real-life situations, it may be impossible to count the total number of times a particular event occurs or does not occur. In such cases, Poisson Distribution is more suitable.

Poisson Distribution is a discrete probability distribution. It was originated by Simeon Denis Poisson.

Analysis of Variance (ANOVA)

Analysis of variance may be defined as a technique which analyses the variance of two or more comparable series (or samples) for determining the significance of differences in their arithmetic means and for determining whether different samples under study are drawn from the same population or not, with the use of the statistical technique called the F-test.

Characteristics of Analysis of Variance:

  1. It makes statistical analysis of variance of two or more samples.
  2. It tests whether the difference in the means of different samples is due to chance or due to any significant cause.
  3. It uses the statistical test called the F-Ratio.

Types of Variance Analysis:

There are two types of variance analysis:

  1. One-way Analysis of Variance
  2. Two-way Analysis of Variance

Measures of Central Tendency

A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of it as the tendency of data to cluster around a middle value. In statistics, the three most common measures of central tendency are the mean, median, and mode. Each of these measures calculates the location of the central point using a different method.

Correlation

Correlation is usually defined as a measure of the linear relationship between two quantitative variables (e.g., height and weight). Often a slightly looser definition is used, whereby correlation simply means that there is some type of relationship between two variables. This post will define positive and negative correlation, provide some examples of correlation, explain how to measure correlation, and discuss some pitfalls regarding correlation.

Regression

Regression is a statistical measurement used in finance, investing, and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables).