Key Concepts in Probability Distributions and Statistical Analysis

Continuous Probability Distributions

A continuous distribution is a type of probability distribution in which the random variable can take any value within a given range or interval. Unlike discrete distributions that deal with countable outcomes, continuous distributions describe data that can vary infinitely, such as height, weight, temperature, or time.

These distributions are represented using a Probability Density Function (PDF). Probabilities are calculated over intervals, since the probability

Read More

Ensemble Methods Comparison: Bagging, Boosting, and Stacking Techniques

Bagging Classifier Implementation

Base Model Performance

base_model = DecisionTreeClassifier(random_state=42)
base_model.fit(X_train, y_train)

y_pred_base = base_model.predict(X_test)
base_recall = recall_score(y_test, y_pred_base)
print("Recall del modelo base: {:.4f}".format(base_recall))

Hyperparameter Tuning (Grid Search)

param_grid = {
    "n_estimators": [10, 50, 100],
    "max_samples": [0.5, 0.8, 1.0],
    "max_features": [0.5, 0.8, 1.0],
    "bootstrap": [True]
}

bagging = BaggingClassifier(
Read More

Fundamentals of Statistical Graphics and Data Analysis

Understanding Statistical Graphics

A statistical graphic is the representation of statistical data to obtain an overall visual impression of the material presented, which facilitates its rapid comprehension. Graphics are an alternative to tables for representing frequency distributions. Some recommended requirements for building a graph include: simplicity, avoiding exaggerated scale distortions, and the appropriate choice of chart type according to the objectives and the measurement level of the

Read More

Statistical Foundations for Data Analysis

PPDAC Cycle: Data Problem-Solving

  • Problem: Clearly define your research question.

  • Plan: Choose a sampling method and variables.

  • Data: Collect and clean data (e.g., remove errors, handle missing values).

  • Analysis: Use EDA (plots & statistics) and model relationships (e.g., regression).

  • Conclusion: Answer your research question. Be cautious about generalizing!


Essential Sampling Methods

MethodDescriptionProsCons
Simple RandomEach unit has equal chance (like a lucky draw)UnbiasedMay need full list of population
SystematicPick
Read More

Data Analysis & Measurement in Psychology: Scientific Method Foundations

Data Analysis and Measurement in Psychology: The Scientific Method

The objective of scientific method studies is to conduct procedures that are systematic (with established steps) and verifiable (with data that can be replicated or refuted by any researcher). However, the scientific method is just one component of the scientific research process, which consists of three levels (Arnaud):

  1. Theoretical and Conceptual Level

    1. Defining the problem and hypotheses
    2. Deduction of testable predictions

  2. Theoretical-

Read More

Data Analysis Fundamentals: Central Tendency & Variability

Descriptive Statistics: Central Tendency & Dispersion

Measures of Central Tendency

Understanding the Mean

The mean of the weights is the average of all weights in the table.

Remarks on the Mean
  • Very easy to compute.
  • Takes into consideration all values in the dataset.
  • Highly sensitive to extreme values among the data (outliers).

There are some variations of the mean (harmonic mean, geometric mean…) which we will not study in this course.

Understanding the Median

The median is the number in the middle

Read More