Statistical Analysis: Essential Concepts and Techniques
Five Number Summary and Outliers
The five-number summary includes the minimum, first quartile (Q1), second quartile (Q2), third quartile (Q3), and maximum. Outliers are values above Q3 + 1.5xIQR or below Q1 – 1.5xIQR.
Shifting and Scaling Data
Shifting data by adding a constant C:
- Mean and median increase by C.
- Spread remains unchanged.
Scaling data by multiplying by a constant C:
- Mean, median, and standard deviation (SD) are multiplied by C.
- Variance is multiplied by C2.
The 68-95-99.7 Rule
This rule states
Read MoreSales and Production Budget Analysis: Q1 2009
Sales Budget
The company projects a 7% growth in unit sales for the first quarter of 2009, based on FY2008 historical data. The net sales price will increase from $320 to $380.
Month | Historical Units | Growth | Projected Units | Sale Price | Projected Sales Revenue |
---|---|---|---|---|---|
January | 3,500 | 7% | 3,745 | $380 | $1,423,100 |
February | 2,200 | 7% | 2,354 | $380 | $894,520 |
March | 1,800 | 7% | 1,926 | $380 | $731,880 |
Total | 7,500 | 8,025 | $3,049,500 |
Finished Goods Production Budget
The inventory of finished goods (FG) for the quarter is as follows:
Month | Day | Units |
---|---|---|
December | 31 | 2,100 |
January | 31 | 1, |
Linear Models: Regression, Interactions, and Best Practices
Simple Regression, Inference, and Centering
Simple Linear Regression: Models the linear relationship between two continuous variables.
- Outcome (y): Variable being predicted.
- Predictor (x): Variable used for prediction.
- Intercept (b₀): Predicted y when x=0. Affected by centering.
- Slope (b₁): Change in y for a one-unit change in x.
- Residual: Difference between observed and predicted y.
- OLS (Ordinary Least Squares): Method to find the best-fitting line by minimizing the sum of squared residuals.
Centering:
Read MoreEssential Python Libraries and AI Project Development
NumPy: Numerical Python
NumPy, which stands for Numerical Python, is a Python library that provides functionality to do the following:
- Create multidimensional array objects (called ndarrays or NumPy arrays).
- Provide tools for working with ndarrays.
NumPy Arrays
An array, in general, refers to a named group of homogeneous elements. A NumPy array is simply a grid that contains values of the same type.
import numpy as np
list = [1, 2, 3, 4]
a1 = np.array(list)
print(a1)
Although NumPy arrays look similar to
Read MoreKey Concepts in Statistics: From Distributions to Data Analysis
Binomial Distribution
Binomial distribution is a common discrete distribution used in statistics, as opposed to a continuous distribution, such as normal distribution. This is because binomial distribution only counts two states, typically represented as 1 (for a success) or 0 (for a failure), given a number of trials in the data. Binomial distribution thus represents the probability for x successes in n trials, given a success probability p for each trial. The underlying assumptions of binomial
Read MoreData Warehousing: Key Concepts and Best Practices
What is a Data Warehouse and Why is it Needed?
A Data Warehouse is a centralized repository that integrates, stores, and manages data from multiple sources, transforming it into a structured format to support decision-making and analytical processes. Unlike operational databases, which are optimized for transactional tasks, data warehouses are designed for read-intensive activities like querying, reporting, and data analysis.
Definition:
A data warehouse is a subject-oriented, integrated, time-variant,
Read More