Stock Return Analysis and Statistical Concepts
Stock Comparison: Return and Risk
Stock 1: x̄ = 9.62% and s = 23.58%
Stock 2: x̄ = 12.38% and s = 15.45%
x̄ represents the average return of a stock.
Stock 2 has a higher average return because its x̄ (12.38%) is greater than Stock 1’s x̄ (9.62%).
Stock 1 is riskier. Standard deviation (s) measures return volatility. A higher standard deviation indicates wider fluctuations and greater unpredictability. Therefore, Stock 1 is riskier (s = 23.58%) than Stock 2 (s = 15.45%).
Implication: Sharpe Ratio
Stock 2 likely has a higher Sharpe ratio, indicating a better risk-adjusted return. For each unit of risk, Stock 2 offers a potentially higher return compared to Stock 1, making it a potentially better investment, assuming the same risk-free rate.
Central Limit Theorem
According to the Central Limit Theorem, the sampling distribution of the mean is normally distributed when the sample size is large enough, regardless of the population distribution’s shape (e.g., bimodal as shown above).
Correlation Strength
A correlation’s sign (positive or negative) indicates the association’s direction, not its strength. The absolute value of the correlation coefficient determines the strength.
Price = -41574.57 + 116705.43 * Total # Rooms + e
Interpreting the Coefficient on Total # Rooms
For each additional room, the average predicted sale price increases by $116,705.43.
Interpreting the Intercept
The intercept (-$41,574.57) represents the average sale price of a property with zero rooms. This isn’t meaningful in real-world scenarios where properties have at least one room and prices are non-negative.
Happiness and Income: Average Change and Confidence Interval
Question: What is the average change in happiness for every $50,000 increase in income? What is the 95% confidence interval for this change?
$50,000 / $1,000 = 50 (measured in $1,000s)
ΔY = 0.002869 * 50 = 0.1445 (b1 * 50)
Lower Bound = 0.001669 * 50 = 0.08345
Upper Bound = 0.004069 * 50 = 0.2035
Impact of Controlling for Height on Age Coefficient
The coefficient on age is likely smaller in the third regression because it controls for height. Since age and height are correlated, including height in the model captures some of the weight variance previously attributed to age. The first two regressions examine the correlation between two factors, while the third acknowledges that weight is influenced by multiple factors, resulting in a smaller age coefficient when height is considered.
Difference in Sleep Between Mondays and Fridays
Method 1:
Mondays: ^sleep = 575.8 – 67.6 * 1 = 508.2 minutes
Fridays: ^sleep = 575.8 – 93.1 * 1 = 482.7 minutes
Difference = 508.2 – 482.7 = 25.5 minutes
Method 2:
Difference = Difference in coefficients = (-67.6) – (-91.3) = 23.7 minutes
Conclusion: There might be a calculation error in Method 1. Assuming Method 2 is correct, the average sleep difference between Mondays and Fridays is 23.7 minutes, with Mondays having more sleep, holding other factors constant.
Difference in Sleep Between Saturday and Sunday
The difference between Saturday and Sunday (the excluded category) is represented by Saturday’s coefficient of +38.1 minutes.
Sampling Biases
1. Non-Representative Sample & Self-Selection Bias
Non-Representative Sample Bias: This bias occurs when the sample doesn’t accurately represent the population’s characteristics.
Self-Selection Bias: This occurs when participants volunteer for a study, potentially creating differences between participants and non-participants.
Both biases can distort results.
2. Survivorship Bias
Survivorship Bias: This bias occurs when observations drop out of the sample, altering the remaining data composition and potentially obscuring important information from the missing data.