Machine Learning Fundamentals: Comprehensive Lecture Notes

Posted on Mar 26, 2025 in Archaeology

I. Introduction to Machine Learning (Lecture 01)

Key Concepts:

Definition:
Machine Learning (ML) is a subfield of AI that uses data-driven algorithms to recognize patterns and make decisions without explicit programming.
Why Learn ML?
- Automates tasks (e.g., image classification)
- Adapts to new data
- Provides insights from complex datasets
Terminology:
- Training Example: An individual row (or data point) in your dataset.
- Feature: A measurable property or characteristic (e.g., pixel values, square footage).
- Target/Label: The “correct answer” or output you want to predict.
- Loss Function: A function measuring the difference between predictions and the true target.

Example:
For image classification (e.g., cat vs. dog), traditional methods required manual feature extraction. Modern ML leverages convolutional neural networks (CNNs) that automatically learn features from raw pixels.
lecture_01 Introduction…

II. Overview of ML Algorithms (Lecture 02)

Key Algorithms Covered:

Logistic Regression:
- Used for binary classification (e.g., predicting pass/fail).
K-Nearest Neighbors (KNN):
- Classifies new instances based on the majority label of the nearest training examples.
Decision Trees:
- Splits data by asking a series of questions (e.g., “Is income > θ?”) to form a tree structure for classification or regression.
Naïve Bayes:
- A generative classifier that applies Bayes’ rule with the assumption of feature independence.
Random Forests:
- An ensemble of decision trees that improves predictive accuracy by averaging (or voting) over many trees.
Linear Regression:
- Models the relationship between inputs (features) and a continuous output by fitting a line or hyperplane.

The lectures emphasize a common pattern: explanation, mathematical intuition, derivation, coding implementation, and practical examples.
Lecture_02 Intro to ML …

III. Linear Regression (Lecture 03)

Key Points:

Definition:
A statistical method that models the relationship between a dependent variable y and one or more independent variables x.
Model Equation (Simple Linear Regression): y = w₀ + w₁x + ϵ where w₀ is the intercept, w₁ is the slope, and ϵ represents the error.
Multiple Linear Regression:
Uses multiple features: y = β₀ + β₁x₁ + β₂x₂ + ⋯ + β_nx_n + ϵ
Optimization:
Parameters are typically estimated by minimizing a loss function (e.g., Mean Squared Error) using methods like gradient descent.

Example Problem – House Price Prediction:
Given a dataset of houses with square footage and corresponding prices, fit a linear regression model.

Data:
- Features: Square footage (e.g., 1000, 1500, 2000)
- Target: Price (e.g., $200k, $275k, $340k)
Task:
- Estimate w₀ and w₁ such that the prediction y minimizes the squared error.
Steps:
- Write the loss function: L(w₀, w₁) = ¹/_N∑_i=1^N (y_i – (w₀ + w₁ x_i))²
- Use gradient descent updates: w₀ := w₀ – η ⋅ ^∂L/_∂w₀, w₁ := w₁ – η ⋅ ^∂L/_∂w₁
Interpretation:
The fitted line will help predict house prices based on square footage.
lecture_03 Linear Regre…

IV. Bias-Variance Tradeoff and Under/Overfitting (Lecture 05b)

Key Concepts:

Bias:
Error due to overly simplistic assumptions in the learning algorithm. High bias can cause underfitting.
Variance:
Error due to excessive sensitivity to fluctuations in the training data. High variance can cause overfitting.
Underfitting vs. Overfitting:
- Underfitting: Model is too simple; both training and test errors are high.
- Overfitting: Model is too complex; low training error but high test error.
Techniques to Address:
- Increase model complexity for underfitting.
- Use regularization, early stopping, or gather more data for overfitting.

Example Problem – Diagnosing Model Fit:
Suppose a model has a high training error and high test error. What does this indicate, and how might you improve the model?

Answer:
This situation indicates underfitting (high bias). Possible solutions include adding more features, increasing model complexity, or reducing excessive regularization.
lecture_05b_biasVar – T…

V. Regularization and Hyperparameter Tuning (Lecture 06)

Regularization Techniques:

Ridge Regression (L2 Regularization):
Adds a penalty proportional to the sum of squared coefficients: L_ridge = L_OLS + λ ∑_j=1ⁿ β_j² Shrinks coefficients but does not set them exactly to zero.
LASSO Regression (L1 Regularization):
Adds a penalty proportional to the sum of absolute values: L_lasso = L_OLS + λ ∑_j=1ⁿ |β_j| Can force some coefficients to zero (feature selection).
Elastic Net:
Combines L1 and L2 penalties.

Hyperparameter Tuning:

Definition:
Hyperparameters are settings (e.g., regularization strength λ, learning rate) that are not learned from data.
Common Methods:
- Grid Search: Exhaustively searches a predefined set of hyperparameters.
- Random Search: Randomly samples the parameter space.
- Bayesian Optimization: Uses probabilistic models to focus on promising hyperparameter regions.

Example Problem – Regularized Loss Computation:
Given a linear regression model with coefficients β = [2.0, -0.5] and a regularization parameter λ = 1.0, compute the Ridge regularized loss for a single training example with error e = 3.

Ordinary Squared Error: e² = 9
Ridge Penalty: λ(β₁² + β₂²) = 1.0 × (2.0² + (-0.5)²) = 1.0 × (4 + 0.25) = 4.25
Total Loss: 9 + 4.25 = 13.25
Lecture_06 Regularizati…

VI. K-Nearest Neighbors (KNN) (Based on Lecture 02 and Related Content)

Concepts:

Intuition:
Classify a new data point by finding the k closest training examples and using a majority vote (for classification) or averaging (for regression).
Algorithm:
1. Compute distance (e.g., Euclidean) between the new point and all training examples.
2. Select the k nearest neighbors.
3. For classification: assign the class that appears most frequently; for regression: average the neighbors’ outputs.

Example Problem – Fruit Classification with KNN:
Given a dataset of fruits with features [color intensity, size] and labels (apple, orange), classify a new fruit with features [0.7, 5.0] using k = 3.

Steps:
1. Compute distances to all labeled points.
2. Identify the 3 closest fruits.
3. Determine the majority label among these neighbors.
Answer:
Based on the computed distances, if 2 out of 3 neighbors are apples, then classify the new fruit as an apple.

VII. Decision Trees and Regression Trees (Lecture 07)

Key Points:

Decision Trees:
- Build a tree structure by splitting data on features that best separate the target classes (for classification) or predict continuous outcomes (for regression).
- Impurity Measures:
  - Entropy/Information Gain: Measures disorder; used to decide the best split.
  - Gini Impurity: Another metric for classification splits.
Regression Trees:
- Instead of class labels, these trees predict continuous values by partitioning the feature space.

Example Problem – Building a Simple Decision Tree Regressor:
You are given a small dataset of car ages and prices. Outline how to construct a decision tree that predicts car price based on age.

Split Criterion:
Find the age threshold that minimizes the variance in car prices within the resulting groups.
Recursive Splitting:
Continue splitting each subset until a stopping criterion is met (e.g., minimum number of samples per leaf).
Prediction:
The predicted price for a new car is the average price in the leaf node where it falls.
Lecture_07 DTRegressors

VIII. Ensemble Learning: Bagging, Random Forests, and Boosting (Lecture 08)

Concepts:

Bagging (Bootstrap Aggregating):
- Build multiple models (often decision trees) on different bootstrapped samples of the training data.
- Aggregate the predictions (by majority vote or averaging) to reduce variance.
Random Forests:
- An extension of bagging where each tree considers a random subset of features at each split.
- This further decorrelates trees and improves generalization.
Boosting:
- Sequentially train weak learners (e.g., decision stumps), each focusing on the errors of the previous one.
- Examples include AdaBoost and Gradient Boosting.
Comparison:
- Bagging/Random Forest: Mainly reduces variance.
- Boosting: Reduces bias and can achieve higher accuracy but may overfit if overdone.

Example Problem – Ensemble Prediction:
Given predictions from 3 decision trees for a classification problem (Tree1: Class A, Tree2: Class B, Tree3: Class A), determine the ensemble output using majority vote.

Solution:
Majority vote yields Class A since it appears twice.
Lecture_08_Ensemble_Ran…

IX. Practice Problems (Based on Handsolved ML Problems – Lecture 09)

Practice Problems Overview:

Problem 1:
Optimize a linear regression model by deriving the gradient descent update rule and implementing one iteration on a toy dataset.
Problem 2:
Given training and test errors for a model, determine whether the model is overfitting, underfitting, or well-balanced. Propose improvements accordingly.
Problem 3:
For a classification task using KNN, calculate the Euclidean distances for a set of test points and determine their predicted classes using k=5.

Work through these problems by writing down the steps, calculations, and verifying your answers against known results.
Lecture_09 handsolved_M…

X. Probability and Distributions (Lecture 10)

Fundamental Topics:

Probability Interpretations:
- Frequentist: Probability as long-run frequency (e.g., P(Heads) = 0.5 for a fair coin).
Random Variables:
- Discrete:
  - Uniform, Bernoulli, Binomial, and Degenerate distributions.
- Continuous:
  - Gaussian (Normal) distribution, characterized by its mean μ and variance σ².
Important Concepts:
- Union and Joint Probability:
  - Union: P(A∪B)
  - Joint: P(A∩B), with the product rule for independent events.
- Conditional Probability and Bayes’ Rule:
  - Bayes’ theorem: P(A|B) = ^P(B|A)P(A)/_P(B)
Categorical and Multinomial Distributions:
- Categorical Distribution: One trial with outcomes represented via one-hot encoding.
- Multinomial Distribution: Extends categorical to multiple trials, counting occurrences in each category.
Covariance and Multivariate Gaussian:
- Covariance: Measures the degree to which two variables change together.
- Multivariate Gaussian: Generalization of the normal distribution to multiple dimensions.

Example Problem – Applying Bayes’ Rule:
Suppose a medical test has a 95% accuracy rate. If the prevalence of the disease is 1% in the population and a patient tests positive, calculate the probability they actually have the disease.

Solution Outline:
1. Define:
  - P(Disease) = 0.01
  - P(Positive|Disease) = 0.95
  - P(Positive|No Disease) = 0.05
2. Compute P(Positive) using the law of total probability.
3. Apply Bayes’ theorem to find P(Disease|Positive).
  Lecture_10_probability_…

XI. Generative Models and Naïve Bayes (Lecture 11)

Key Points:

Generative Models:
- Model the joint probability P(x, y) and then use Bayes’ rule to compute the posterior P(y|x).
Naïve Bayes Classifier:
- Assumes conditional independence between features given the class label.
- Calculation: P(y|x₁, x₂, …, x_n) ∝ P(y) ∏_i=1ⁿ P(x_i|y)
Application Areas:
- Text classification (e.g., spam filtering)
- Medical diagnosis
- Any domain where the “naïve” independence assumption is acceptable.

Example Problem – Naïve Bayes Text Classification:
Given a simplified dataset where emails are labeled as “spam” or “not spam” and feature probabilities P(word|spam) are provided, compute the posterior probability for a new email containing specific words.

Steps:
1. Compute the prior probabilities P(spam) and P(not spam).
2. Multiply by the likelihoods P(word_i|spam) for each word in the email.
3. Normalize to obtain the posterior probabilities.
  Lecture_11_Generative_N…

XII. Summary and Study Tips

Key Takeaways:

Foundational Concepts:
Understand core definitions (features, labels, loss functions) and the general ML workflow.
Model Selection:
Recognize the strengths and weaknesses of different algorithms (e.g., linear vs. decision trees vs. ensemble methods).
Practical Considerations:
Focus on bias–variance tradeoff, regularization techniques, and the importance of hyperparameter tuning.
Probability:
Be comfortable with basic probability theory and its application in generative models.

Study Tips:

Practice Problems:
Work through the example problems provided, and try variations on them.
Visualization:
Draw diagrams (e.g., decision trees, regression lines) to solidify your understanding.
Coding Exercises:
Implement simple models in Python (using libraries like scikit-learn) to observe theoretical concepts in practice.
Review Hand-Solved Problems:
Revisit the worked examples from Lecture 09 to understand common pitfalls and solution strategies.

Machine Learning Fundamentals: Comprehensive Lecture Notes

I. Introduction to Machine Learning (Lecture 01)

II. Overview of ML Algorithms (Lecture 02)

III. Linear Regression (Lecture 03)

IV. Bias-Variance Tradeoff and Under/Overfitting (Lecture 05b)

V. Regularization and Hyperparameter Tuning (Lecture 06)

VI. K-Nearest Neighbors (KNN) (Based on Lecture 02 and Related Content)

VII. Decision Trees and Regression Trees (Lecture 07)

VIII. Ensemble Learning: Bagging, Random Forests, and Boosting (Lecture 08)

IX. Practice Problems (Based on Handsolved ML Problems – Lecture 09)

X. Probability and Distributions (Lecture 10)

XI. Generative Models and Naïve Bayes (Lecture 11)

XII. Summary and Study Tips

Recent Notes

Subjects

Publicidad