Association Rules and Decision Trees in Machine Learning

Unsupervised Machine Learning

Market Basket Analysis (MBA)

  • Definition: Identifies relationships between items frequently bought together, aiding businesses in decisions like product placement and cross-selling.
  • Output: Generates association rules that describe patterns in transactions.
    • Example: {butter, jam} → {bread} implies that buying butter and jam often leads to buying bread.

Association Rules

  • LHS (Left-Hand Side): Items frequently purchased together (e.g., {butter, jam}).
  • RHS (Right-Hand Side): Related items likely to be purchased with LHS (e.g., {bread}).
  • Goal: Understand patterns in a transaction database, not prediction.

Apriori Algorithm

  • Identifies frequent item sets by assuming all subsets of frequent sets are frequent.
  • Reduces computational complexity by focusing only on combinations with sufficient frequency.
  • The process of creating associations occurs in two phases:
  1. Identify items with a minimum support threshold (frequency).
  2. Create association rules with a minimum confidence threshold (certainty of co-occurrence).

Evaluating Association Rules

  1. Support: Proportion of transactions containing a specific subset.
  • Support(X)=Transactions with XTotal Transactions\text{Support}(X) = \frac{\text{Transactions with X}}{\text{Total Transactions}}Support(X)=Total TransactionsTransactions with X​


  1. Confidence: Likelihood of RHS given LHS.
    • Confidence(X→Y)=Transactions with X and YTransactions with X\text{Confidence}(X \to Y) = \frac{\text{Transactions with X and Y}}{\text{Transactions with X}}Confidence(X→Y)=Transactions with XTransactions with X and Y​
  2. Lift: Measures the strength of association beyond random co-occurrence.
    • Lift(X→Y)=Confidence(X→Y)Support(Y)\text{Lift}(X \to Y) = \frac{\text{Confidence}(X \to Y)}{\text{Support}(Y)}Lift(X→Y)=Support(Y)Confidence(X→Y)​

Categorizing Rules

  • Actionable: Useful insights for decision-making (e.g., cross-selling opportunities).
  • Trivial: Obvious and not valuable.
  • Inexplicable: No clear cause-effect relationship.

Advantages and Disadvantages of Association Rule Mining

Advantages:

  1. Able to work with large transactional databases.
  2. Results in easy-to-understand rules.
  3. Useful for discovering unexpected patterns in databases.

Disadvantages:

  1. Not very useful for small databases.
  2. It takes effort to draw valid conclusions.
  3. Relatively easy to misinterpret random patterns as meaningful.


Supervised Machine Learning

Decision Trees

  • Definition:
    Decision trees predict outcomes by splitting data based on variable values (numerical or categorical), providing a path of decisions leading to a final prediction.
  • Example:
    When deciding whether to accept a job offer, you may consider factors like:
    • “Travel time greater than 1 hour?”
    • “Salary greater than €50,000?”
    • “Company aligns with personal values?”
      Based on these, the tree leads to a decision: ACCEPT or DECLINE.

Advantages of Decision Trees

  1. Easy to visualize and interpret, even for non-experts.
  2. Transparent: Useful when understanding the decision-making process is crucial (e.g., credit approval, hiring).
  3. Works with both categorical and numerical data.

Disadvantages of Decision Trees

  1. Risk of overfitting: May perform well on training data but poorly on new data.
  2. Sensitive to data changes: Small variations can lead to a completely different tree.
  3. Large trees may become complex and hard to interpret.

How Decision Trees Work

  1. Divide and Conquer Principle:
  • Start with the entire dataset (root node).
  • Split data into branches based on the variable that provides the best separation of outcomes (high purity).

Node Types:

  • Internal Node: Splits data further.
  • Leaf Node: Represents the final decision/class.


Stopping Conditions

A tree stops growing when:

  1. All or most examples in a node belong to the same class (pure node).
  2. No more variables to split on.
  3. The tree reaches a predefined maximum depth.

Key Concepts

  1. Purity:
  • Purity measures the homogeneity of a subset. A pure node contains data of only one class.

Entropy (Shannon’s Entropy):

  • Metric to measure impurity. Lower entropy means higher purity.

Information Gain (IG):

  • Measures reduction in entropy after a split.
  • High IG indicates a good split.

Alternative Metrics:

  • Gini Index: Measures impurity.
  • Chi-Square Statistic: Tests independence.
  • Gain Ratio: Adjusts IG for bias.

Common Algorithms

  1. C5.0 Algorithm:
  • Developed by J. Ross Quinlan.
  • An improvement over earlier ID3 and C4.5 algorithms.
  • Focuses on maximizing purity through entropy and IG.


Tree Complexity and Pruning

  • Overfitting: Trees that grow too large perform poorly on new data.
  • Pruning Techniques:
  1. Pre-Pruning: Limit growth (e.g., max depth or minimum node size).
  • Disadvantage: May miss subtle but meaningful patterns.

Post-Pruning: Let the tree grow fully, then remove unnecessary branches.

  • Improves generalization ability.

Evaluating Decision Trees

  1. Confusion Matrix:
  • Compares predicted vs. actual outcomes to assess accuracy.
  • Key metrics include:
    • True Positive (TP): Correctly predicted positives.
    • False Positive (FP): Incorrectly predicted as positive.
    • False Negative (FN): Missed positives.
    • True Negative (TN): Correctly predicted negatives.

Cost Matrix:

  • Weighs certain errors (e.g., FP, FN) higher if their consequences are more severe.
  • Example: In medical diagnosis, missing a disease (FN) is costlier than a false alarm (FP).