Key Statistical Distributions and Estimator Properties
Key Properties of a Good Estimator
The following are some of the characteristics which should be satisfied by a good estimator:
- An estimator should be unbiased.
- An estimator should be consistent.
- An estimator should be efficient.
- An estimator should be sufficient.
1. Unbiasedness
A statistic ‘t’ is said to be an unbiased estimator of the population parameter ‘θ’ if E(t) = θ. For example, if ‘t’ follows a sampling distribution and the mean of that distribution is the value of the parameter ‘θ’, then t is the unbiased estimator ‘θ’.
We can see that the sample mean is the unbiased estimator of the population mean, i.e., E(x̅) = μ where x̅ is the sample mean and μ is the population mean.
2. Consistency
A statistic ‘t’ based on a sample of size ‘n’ is said to be a consistent estimator of the parameter ‘θ’ if it converges in probability to 0.
i.e., if p[t->0] as n-> ∞ then ‘t’ is a consistent estimator of θ. i.e., when n is large, the probability for t tending to θ is near to 1.
If E(tn) = θ or E(tn) θ and V(tn)->0 as n-> ∞, then tn is a consistent estimate of the parameter θ
Example: The sample mean is a consistent estimator of the population mean since for large values of n, the sample mean tends to the population mean.
3. Efficiency
If t1 and t2 are two consistent estimators of parameter θ and if the variance of t1 is less than the variance of t2 for all n, then t1 is said to be more efficient than t2 (V(t1) < V(t2)).
That is, an estimator with lesser variability is said to be more efficient and consequently more reliable than the other. Example: The sample mean is more efficient than the sample median as an estimator of the population mean.
4. Sufficiency
A Statistic ‘t’ is said to be a sufficient estimator of parameter θ if it contains all the information in the sample regarding the parameter. In other words, a sufficient statistic utilizes all the information that a given sample can furnish about the parameter.
A sufficient estimator is most efficient when an efficient estimator exists. It is always a consistent estimator. It may or may not be unbiased. A necessary and sufficient condition for an estimate of a parameter θ to be sufficient is given by Neyman.
Understanding Sampling Distributions
A sample statistic is a random variable. As every random variable has a probability distribution, a sample statistic also has a probability distribution.
The probability distribution of a sample statistic is called the sampling distribution of that statistic. For example: The sample mean is a statistic I and the distribution of the sample mean is a sampling distribution. Sampling distribution plays a very important role in the study of statistical inference.
Standard Error
The standard deviation of a sampling distribution of a statistic is called the standard error of that statistic. For example, the sample mean (x̅) has a sampling distribution. The S.D of that distribution is called the standard error of x̅.
Uses of Standard Error
Standard error plays a very important role in the large sample theory and forms the basis of the testing of the hypothesis.
- It is used for testing a given hypothesis.
- S.E gives an idea about the reliability of a sample. The reciprocal of S.E is a measure of the reliability of the sample.
- S.E can be used to determine the confidence limits of population measures like mean, proportion, and standard deviation.
Commonly Used Sampling Distributions
- Normal distribution
- Chi-square (x2) distribution
- t-distribution
- F-distribution
Properties of the Bernoulli Distribution
- P is the parameter of the distribution.
- The mean of the distribution is p.
- The variance is pq where q = 1 — p.
- Variance is less than the mean.
- If X1, X2, ……, Xn are n independently and identically distributed Bernoulli variates with parameter p, then (X1 + X2 + … + Xn) follows the Bernoulli distribution with n and p as parameters.
Situations Where the Binomial Distribution Can Be Applied
- The random experiment has two outcomes, which can be called ‘success’ and ‘failure’.
- The probability of success in a single trial remains constant from trial to trial of the experiment.
- The experiment is repeated a finite number of times.
- Trials are independent.
Properties of the Binomial Distribution
- The binomial distribution is a discrete probability distribution.
- The shape and location of the binomial distribution change as ‘p’ changes for a given ‘n’.
- The binomial distribution has one or two modal values.
- The mean of the binomial distribution increases as ‘n’ increases, with ‘p’ remaining constant.
- If ‘n’ is large and if neither ‘p’ nor ‘q’ is too close to zero, the binomial distribution may be approximated to the normal distribution.
- The binomial distribution has mean = np and S. D = npq.
- If two independent random variables follow the binomial distribution, their sum also follows the binomial distribution.
Fitting a Binomial Distribution
- Determine the values of p, q, and n and substitute them in the function nCxpxqn-x, we get the probability function of the binomial distribution.
- Put x = 0, 1, 2….. in the function nCxpxqn-x, we get n + 1 terms.
- Multiply each such term by N (total frequency), to obtain the expected frequency.
Uses of the Poisson Distribution
Practical situations where the Poisson distribution can be used:
- To count the number of telephone calls arriving at a telephone switchboard in unit time (say, per minute).
- To count the number of customers arriving at the supermarket (say per hour).
- To count the number of defects per unit of a manufactured product (in Statistical Quality Control).
- To count the number of radioactive disintegrations of a radioactive element per unit of time (in Physics).
- To count the number of bacteria per unit.
- To count the number of defective materials say, pins, blades, etc., (in a packing of manufactured goods by a concern).
Characteristics of the Poisson Distribution
- The Poisson distribution is a discrete probability distribution.
- If ‘x’ follows a Poisson distribution, then ‘x’ takes values 0, 1, 2,… to infinity.
- It has a single parameter, m. When ‘m’ is known all the terms can be found out.
- The mean and variance of the Poisson distribution are equal to m.
- The Poisson distribution is a positively skewed distribution.
Properties of the Normal Distribution
- The normal curve is a continuous curve.
- The normal curve is bell-shaped.
- The normal curve is symmetric about the mean.
- Mean, median, and mode are equal for a normal distribution.
- The height of the normal curve is at its maximum at the mean.
- There is only one maximum point, which occurs at the mean.
- The ordinate at the mean divides the whole area into two equal parts (i.e., 0.5 on either side).
Chi-Square Test (x2 Test)
Parametric and Non-Parametric Tests
In certain test procedures, assumptions about the population distribution or parameters are made.
For example, in ‘t’ we assume the test that the samples are drawn from the population following a normal distribution. When such assumptions are made, the test is known as a parametric test.
There are situations when it is not possible to make any assumption about the distribution of the population from which samples are drawn. In such situations, we follow test procedures which are known as non-parametric tests. The x2-test is an example of a non-parametric test.
x2-Test
The statistical test in which the test statistic follows a x2 distribution is called the x2-test.
Therefore, the x2-test is a statistical test, which tests the significance of the difference between observed frequencies and the corresponding theoretical frequencies of a distribution, without any assumption about the distribution of the population.
The x2-test is one of the simplest and most widely used non-parametric tests in statistical work. This test was developed by Prof. Karl Pearson in 1900.
Characteristics of the x2-Test
- It is a non-parametric test. Assumptions about the form of the distribution or its parameters are not required.
- It is a distribution-free test, which can be used in any type of distribution of the population.
- It is easy to calculate the x2 test statistic.
- It analyses the difference between a set of observed frequencies and a set of corresponding expected frequencies.
Uses (Applications) of the x2-Test
The x2-test is one of the most useful statistical tests. It is applicable to a very large number of problems in practice. The uses of the x2-test are explained below:
- Useful for the test of goodness of fit. The x2-test can be used to ascertain how well theoretical distributions fit the data. We can test whether there is a goodness of fit between the observed frequencies and expected frequencies.
- Useful for the test of independence of attributes: With the help of the x2-test we can find out whether two attributes are associated or not.
- Useful for testing homogeneity: Tests of independence are concerned with the problem of whether one attribute is independent of another, while tests of homogeneity are concerned with whether different samples come from the same population.
- Useful for testing given population variance: The x2-test can be used for testing whether the given population variance is acceptable based on samples drawn from that population.
Chi-Square Distribution (x2 Distribution)
- If Z follows a standard normal distribution, then Z2 will follow the x2 distribution with one degree of freedom.
- Let ‘s’ and ‘σ’ be the standard deviations of the sample and population respectively. Let ‘n’ be the sample size. Then ns2/σ2 follows a x2 distribution with n – 1 degrees of freedom.
- If z1, z2, ………., Zn are n standard normal variates then z12, z22, ………., Zn2 follows the x2 distribution with n degrees of freedom.
Properties of the x2 Distribution
- The x2 distribution is a sampling distribution. It is a continuous probability distribution.
- The parameter of the x2 distribution is n.
- As the degree of freedom increases, the x2 distribution approaches the normal distribution.
- The mean of the x2 distribution is n, the variance of the x2 distribution is 2n and the mode of x2 is n- 2 where ‘n’ is the degree of freedom. For large values of n, the x2 distribution is symmetric.
- The sum of two independent x2 variates is also a x2 distribution variate.
Uses of the x2 Distribution
x2 is a test statistic in tests of hypotheses. Following are the uses of x2:
- To test the given population variance when the sample is small.
- To test the goodness of fit between observed and expected frequencies.
- To test the independence of two attributes.
- To test the homogeneity of data.
Student’s t-Distribution
Let x̅ and s be the mean and S.D of a sample drawn from a normal population and let the sample size (n) be small, then (x̅−μ)/(s/√n-1) follows a t-distribution with n – 1 degrees of freedom.
Properties of the t-Distribution
- The t-distribution is a sampling distribution.
- For large samples, the t-distribution approaches the normal distribution.
- All odd moments of the distribution are 0.
- Mean = 0 and variance = n/(n-2) for n > 2 and n is the degrees of freedom.
- The t-curve is at its maximum at t = 0.
- The t-curve has long tails towards the left to right.
Uses of the t-Distribution
- To test the given population mean when the sample is small.
- To test whether the two samples have the same mean when the samples are small.
- To test whether there is a difference in the observations of the two dependent samples.
- To test the significance of the population correlation coefficient.
Properties of the F-Distribution
- The F-distribution is a sampling distribution.
- If F follows the F-distribution with (n1, n2) degrees of freedom then 1/F follows the F-distribution with (n2, n1) degrees of freedom.
- The mean of the F-distribution is n2/n2-1 where (n1, n2) are the degrees of freedom.
- The F-curve is J-shaped when n2 ≤ 2 and bell-shaped when n1 > 2.
Uses of the F-Distribution
The F-statistic is used for the test of the hypothesis. The test conducted on the basis of the ‘F’ statistic is called the F-test. The F-test can be used to:
- Test the equality of variances of two populations when samples are small.
- Test the equality of means of three or more populations.