Understanding Probability, Distributions, and Statistical Analysis
Understanding Probability
The probability of a given event may be defined as the numerical value given to the likelihood of the occurrence of that event. It is a number lying between ‘0’ and ‘1’. ‘0’ denotes the event which cannot occur, and ‘1’ denotes the event which is certain to occur. For example, when we toss a coin, we can enumerate all the possible outcomes (head and tail), but we cannot say which one will happen.
Permutations
Permutation means arrangement of objects in a definite order. The number of arrangements (permutations) depends upon the total number of objects and the number of objects taken at a time: nPr=n!/(n-r)!
Probability Theorems
There are two important theorems of probability:
- Addition Theorem
- Multiplication Theorem
Addition Theorem
If two events, ‘A’ and ‘B’, are mutually exclusive, the probability of the occurrence of either ‘A’ or ‘B’ is the sum of the individual probability of A and B.
P(A or B) = P(A) + P(B)
i.e., P(A∪B) = P(A) + P(B)
Multiplication Theorem (Independent Events)
If two events are independent, then the probability of occurring both will be the product of the individual probability.
P(A and B) = P(A).P(B)
i.e., P(A∩B) = P(A).P(B)
Bayes’ Theorem
Bayes’ Theorem is based on the proposition that probabilities should be revised on the basis of all the available information. The revision of probabilities based on available information will help to reduce the risk involved in decision-making. The probabilities before revision are called a priori probabilities, and the probabilities after revision are called posterior probabilities.
Bernoulli and Binomial Distributions
Bernoulli distribution. Binomial distribution is the probability distribution expressing the probability of one set of dichotomous alternatives, i.e., success or failure. In other words, it is used to determine the probability of success in experiments on which there are only two mutually exclusive outcomes. Binomial distribution is a discrete probability distribution.
Poisson Distribution
Poisson Distribution is a limiting form of Binomial Distribution. In Binomial Distribution, the total number of trials are known previously. But in certain real-life situations, it may be impossible to count the total number of times a particular event occurs or does not occur. In such cases, Poisson Distribution is more suitable.
Poisson Distribution is a discrete probability distribution. It was originated by Simeon Denis Poisson.
Analysis of Variance (ANOVA)
Analysis of variance may be defined as a technique which analyses the variance of two or more comparable series (or samples) for determining the significance of differences in their arithmetic means and for determining whether different samples under study are drawn from the same population or not, with the use of the statistical technique called the F-test.
Characteristics of Analysis of Variance:
- It makes statistical analysis of variance of two or more samples.
- It tests whether the difference in the means of different samples is due to chance or due to any significant cause.
- It uses the statistical test called the F-Ratio.
Types of Variance Analysis:
There are two types of variance analysis:
- One-way Analysis of Variance
- Two-way Analysis of Variance
Measures of Central Tendency
A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of it as the tendency of data to cluster around a middle value. In statistics, the three most common measures of central tendency are the mean, median, and mode. Each of these measures calculates the location of the central point using a different method.
Correlation
Correlation is usually defined as a measure of the linear relationship between two quantitative variables (e.g., height and weight). Often a slightly looser definition is used, whereby correlation simply means that there is some type of relationship between two variables. This post will define positive and negative correlation, provide some examples of correlation, explain how to measure correlation, and discuss some pitfalls regarding correlation.
Regression
Regression is a statistical measurement used in finance, investing, and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables).