Understanding Random Variables, Distributions, and Statistics

Random Variables and Distributions

Random variable: A variable that contains the outcomes of a chance experience.

Discrete distribution: Many possible outcomes, but they are countable. It’s usually about counting the number of variables.

Mean or expected value: Repeating the experiment enough times so that the average approaches a long-run average.

Continuous distribution: These are for random variables that can take any number of infinite values and possibilities. It is usually measured variables (time, height, weight).

Binomial distribution: The most widely known distribution. Possible outcomes are countable and limited.

Binomial Distribution Assumptions:

  1. The experiment involves n identical trials.
  2. Each trial has only two possible outcomes (success or failure).
  3. Each trial is independent of the previous trial.
  4. The terms p and q remain constant throughout the experiment.


Poisson distribution (discrete distribution)

  • Focuses on the number of discrete occurrences over some interval or continuum.
  • Law of improbable events
  • Widely used in Queuing theory
  • It describes rare events.
  • Each occurrence is independent of the other events.
  • The occurrences in each interval can range from 0 to infinity.
  • The expected number of occurrences must hold constant throughout the experiment.

Long run average is G6IRX1aE2QAAAABJRU5ErkJggg==

Whatever UNIT you have for x, you should have the same UNIT for G6IRX1aE2QAAAABJRU5ErkJggg==

Chebyshev’s theorem says that if we have a distribution that is not normal.

7a8jRDgrw125M9OHDh4OGRmDtGmDP3dhnhvHu8VaCAWxWJOl1A1oH0KMVId8U9+HDx+Zjy00Wqa4c7R49duErdx8+fGwhBjZvogLSn2MW+fDhw8fmY2BKSKm3fEXJhw8fPgaohHz48OFjgPCVkA8fPkYUvhLy4cPHiMJXQj58+BhR+ErIhw8fIwpfCfnw4WNE4SshHz58jCh8JeTDh48Rha+EfPjwMaLwlZAPHz5GFL4S8uHDx4jCV0I+fPgYUfhKyIcPHyMKXwn58OFjROErIR8+fIwofCXkw4ePEYWvhHz48DGCAP4foRYaY6+YdrYAAAAASUVORK5CYII=


Discrete vs. Continuous Distributions

Discrete: We have countable outcomes.

Continuous: Unlimited possible outcomes.

P(x=3) would almost be 0 with a continuous distribution as there are many values (3, 3.01, 3.001, 3.002 etc.)

Instead, we take the range P(2.9

Continuous Distributions

1. Uniform distribution (Rectangular distribution)

  • Simple continuous distribution

Dxf2t0GbiUMEAAAAAElFTkSuQmCC

2. Normal distribution

  • Fits many human characteristics such as weight, height, speed, IQ.
  • It is a continuous distribution.
  • Symmetrical distribution about its mean.
  • Asymptotic to the horizontal axis (they never touch the horizontal axis. they go to -ve and +ve infinity)
  • It is unimodal (we only have one pick in the middle)
  • Family of curves
  • Area under the curve is 1.

Standardized Normal Distribution

Using z values – we are converting our normal distribution into a standard normal distribution. icyP8A8MY0Ge+KXdcAAAAASUVORK5CYII=

z value for 8houSL6sRaQAAAAAElFTkSuQmCC

IS ALWAYS 0


Statistics and Business Analytics

Statistics: Science dealing with the collection, analysis, interpretation, and presentation of numerical data.

Variable: Characteristics of any entity being studied that is capable of taking on different values.

Measurement: Is a standard process used to assign numbers from a particular attribute.

Business Analytics: Application of processes and techniques that transform raw data into meaningful information to improve decision making.

Types of Business Analytics:

  1. Descriptive Analytics – simple and commonly used, data mining, statistics
  2. Predictive Analytics – makes prediction about future
  3. Prescriptive Analytics – risks

Population: A collection of persons, items, objects of interest.

Sample: A portion of the whole.

Census: When analysts gather data from the whole population for a given measurement of interest.

Main Branches of Statistics:

  1. Descriptive Statistics – using gathered data on a group to describe or reach conclusions about the same group.
  2. Inferential Statistics – gathers data from a sample and uses the statistics generated to reach conclusions about the population from which the sample was taken.
    • Inferential Statistics also known as inductive statistics, widely used in pharmaceutical research, Allows studying a wide range of phenomena without having to conduct a census, Starts from a hypothesis and makes a statement about the population.


Parameter (Descriptive measure of Population) – Greek tq0vadMHrFQAAAABJRU5ErkJggg==

Statistics (Descriptive measure of Sample) – Roman Ay2OhjdIq3IxAAAAAElFTkSuQmCC

Inferences about parameters are made under uncertainty.

RhnWsAAAAASUVORK5CYII=

DxOPSlur08EAAAAASUVORK5CYII=


8HIfHeqDArGGMAAAAASUVORK5CYII=


Measures of Central Tendency and Variability

1. Measure of central tendency: Yield information about the center, or middle part, of a group of numbers.

  • Mean
  • Mode
  • Median
  • Percentile – it divides a group of data into 100 parts. So that is 99 percentiles 4HXphixgDgpBwAAAAASUVORK5CYII=

    • if i is a whole number: the Pth percentile is the average of the value at the ith and (i+1) locations
    • if I is not a whole number: the Pth percentile value is located at the whole number part of i+1
  • Quartile – divides a group of data into 4 subgroups or parts.

2. Measure of Variability:

  • Range = max – min
  • Interquartile range = Q3 – Q1
  • Deviation from mean – Sum of the deviations from the Arithmetic Mean is Always Zero Σ(xi-μ)=0 that is why we square it.
  • Mean absolute deviation
  • Variance
  • Standard deviation
  • Z scores
  • Coefficient of variation