Key Statistical and Machine Learning Concepts in Data Science
Probability Distribution
Probability distribution gives the possibility of each outcome of a random experiment or event. It provides the probabilities of different possible occurrences. Probability distribution yields the possible outcomes for any random event. It is also defined based on the underlying sample space as a set of possible outcomes of any random experiment. These settings could be a set of real numbers, a set of vectors, or a set of any entities. It is a part of probability and statistics.
Read MoreStatistical Analysis: Essential Concepts and Techniques
Five Number Summary and Outliers
The five-number summary includes the minimum, first quartile (Q1), second quartile (Q2), third quartile (Q3), and maximum. Outliers are values above Q3 + 1.5xIQR or below Q1 – 1.5xIQR.
Shifting and Scaling Data
Shifting data by adding a constant C:
- Mean and median increase by C.
- Spread remains unchanged.
Scaling data by multiplying by a constant C:
- Mean, median, and standard deviation (SD) are multiplied by C.
- Variance is multiplied by C2.
The 68-95-99.7 Rule
This rule states
Read MoreSales and Production Budget Analysis: Q1 2009
Sales Budget
The company projects a 7% growth in unit sales for the first quarter of 2009, based on FY2008 historical data. The net sales price will increase from $320 to $380.
Month | Historical Units | Growth | Projected Units | Sale Price | Projected Sales Revenue |
---|---|---|---|---|---|
January | 3,500 | 7% | 3,745 | $380 | $1,423,100 |
February | 2,200 | 7% | 2,354 | $380 | $894,520 |
March | 1,800 | 7% | 1,926 | $380 | $731,880 |
Total | 7,500 | 8,025 | $3,049,500 |
Finished Goods Production Budget
The inventory of finished goods (FG) for the quarter is as follows:
Month | Day | Units |
---|---|---|
December | 31 | 2,100 |
January | 31 | 1, |
Linear Models: Regression, Interactions, and Best Practices
Simple Regression, Inference, and Centering
Simple Linear Regression: Models the linear relationship between two continuous variables.
- Outcome (y): Variable being predicted.
- Predictor (x): Variable used for prediction.
- Intercept (b₀): Predicted y when x=0. Affected by centering.
- Slope (b₁): Change in y for a one-unit change in x.
- Residual: Difference between observed and predicted y.
- OLS (Ordinary Least Squares): Method to find the best-fitting line by minimizing the sum of squared residuals.
Centering:
Read MoreEssential Python Libraries and AI Project Development
NumPy: Numerical Python
NumPy, which stands for Numerical Python, is a Python library that provides functionality to do the following:
- Create multidimensional array objects (called ndarrays or NumPy arrays).
- Provide tools for working with ndarrays.
NumPy Arrays
An array, in general, refers to a named group of homogeneous elements. A NumPy array is simply a grid that contains values of the same type.
import numpy as np
list = [1, 2, 3, 4]
a1 = np.array(list)
print(a1)
Although NumPy arrays look similar to
Read MoreKey Concepts in Statistics: From Distributions to Data Analysis
Binomial Distribution
Binomial distribution is a common discrete distribution used in statistics, as opposed to a continuous distribution, such as normal distribution. This is because binomial distribution only counts two states, typically represented as 1 (for a success) or 0 (for a failure), given a number of trials in the data. Binomial distribution thus represents the probability for x successes in n trials, given a success probability p for each trial. The underlying assumptions of binomial
Read More