Understanding Random Variables, Distributions, and Statistics
Random Variables and Distributions
Random variable: A variable that contains the outcomes of a chance experience.
Discrete distribution: Many possible outcomes, but they are countable. It’s usually about counting the number of variables.
Mean or expected value: Repeating the experiment enough times so that the average approaches a long-run average.
Continuous distribution: These are for random variables that can take any number of infinite values and possibilities. It is usually measured variables (time, height, weight).
Binomial distribution: The most widely known distribution. Possible outcomes are countable and limited.
Binomial Distribution Assumptions:
- The experiment involves n identical trials.
- Each trial has only two possible outcomes (success or failure).
- Each trial is independent of the previous trial.
- The terms p and q remain constant throughout the experiment.
Poisson distribution (discrete distribution)
- Focuses on the number of discrete occurrences over some interval or continuum.
- Law of improbable events
- Widely used in Queuing theory
- It describes rare events.
- Each occurrence is independent of the other events.
- The occurrences in each interval can range from 0 to infinity.
- The expected number of occurrences must hold constant throughout the experiment.
Long run average is
Whatever UNIT you have for x, you should have the same UNIT for
Chebyshev’s theorem says that if we have a distribution that is not normal.
Discrete vs. Continuous Distributions
Discrete: We have countable outcomes.
Continuous: Unlimited possible outcomes.
P(x=3) would almost be 0 with a continuous distribution as there are many values (3, 3.01, 3.001, 3.002 etc.)
Instead, we take the range P(2.9
Continuous Distributions
1. Uniform distribution (Rectangular distribution)
- Simple continuous distribution
2. Normal distribution
- Fits many human characteristics such as weight, height, speed, IQ.
- It is a continuous distribution.
- Symmetrical distribution about its mean.
- Asymptotic to the horizontal axis (they never touch the horizontal axis. they go to -ve and +ve infinity)
- It is unimodal (we only have one pick in the middle)
- Family of curves
- Area under the curve is 1.
Standardized Normal Distribution
Using z values – we are converting our normal distribution into a standard normal distribution.
z value for
IS ALWAYS 0
Statistics and Business Analytics
Statistics: Science dealing with the collection, analysis, interpretation, and presentation of numerical data.
Variable: Characteristics of any entity being studied that is capable of taking on different values.
Measurement: Is a standard process used to assign numbers from a particular attribute.
Business Analytics: Application of processes and techniques that transform raw data into meaningful information to improve decision making.
Types of Business Analytics:
- Descriptive Analytics – simple and commonly used, data mining, statistics
- Predictive Analytics – makes prediction about future
- Prescriptive Analytics – risks
Population: A collection of persons, items, objects of interest.
Sample: A portion of the whole.
Census: When analysts gather data from the whole population for a given measurement of interest.
Main Branches of Statistics:
- Descriptive Statistics – using gathered data on a group to describe or reach conclusions about the same group.
- Inferential Statistics – gathers data from a sample and uses the statistics generated to reach conclusions about the population from which the sample was taken.
- Inferential Statistics also known as inductive statistics, widely used in pharmaceutical research, Allows studying a wide range of phenomena without having to conduct a census, Starts from a hypothesis and makes a statement about the population.
Parameter (Descriptive measure of Population) – Greek
Statistics (Descriptive measure of Sample) – Roman
Inferences about parameters are made under uncertainty.
Measures of Central Tendency and Variability
1. Measure of central tendency: Yield information about the center, or middle part, of a group of numbers.
- Mean
- Mode
- Median
Percentile – it divides a group of data into 100 parts. So that is 99 percentiles
- if i is a whole number: the Pth percentile is the average of the value at the ith and (i+1) locations
- if I is not a whole number: the Pth percentile value is located at the whole number part of i+1
- Quartile – divides a group of data into 4 subgroups or parts.
2. Measure of Variability:
- Range = max – min
- Interquartile range = Q3 – Q1
- Deviation from mean – Sum of the deviations from the Arithmetic Mean is Always Zero Σ(xi-μ)=0 that is why we square it.
- Mean absolute deviation
- Variance
- Standard deviation
- Z scores
- Coefficient of variation