> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Decision Trees

> Make decisions like a flowchart - if this, then that

# Decision Trees

<Frame>
  <img src="https://mintcdn.com/devweeekends/1cs3K7TO-w20cKuc/images/courses/ml-mastery/decision-tree-concept.svg?fit=max&auto=format&n=1cs3K7TO-w20cKuc&q=85&s=fe8a9040ada84577de0f72c6ddd87cd1" alt="Decision Tree Structure" width="1080" height="1080" data-path="images/courses/ml-mastery/decision-tree-concept.svg" />
</Frame>

## You Already Think in Trees

Every day, you make decisions using "if-then" logic:

**Should I bring an umbrella?**

1. Is it raining? → Yes → Bring umbrella
2. Is it cloudy? → Yes → Bring umbrella
3. Is forecast rain > 50%? → Yes → Bring umbrella
4. Otherwise → Don't bring umbrella

**That's a decision tree!**

<Frame>
  <img src="https://mintcdn.com/devweeekends/1cs3K7TO-w20cKuc/images/courses/ml-mastery/decision-tree-real-world.svg?fit=max&auto=format&n=1cs3K7TO-w20cKuc&q=85&s=b3996171f3327b348ba107c51027cac0" alt="Loan Approval Decision Tree" width="1080" height="1080" data-path="images/courses/ml-mastery/decision-tree-real-world.svg" />
</Frame>

***

## The Loan Approval Problem

Imagine you're a bank deciding whether to approve loans:

```python theme={null}
import numpy as np
import pandas as pd

# Loan applicant data
data = {
    'age': [25, 35, 45, 20, 35, 52, 23, 40, 60, 48],
    'income': [50000, 120000, 80000, 20000, 60000, 150000, 30000, 95000, 70000, 85000],
    'credit_score': [650, 750, 680, 500, 720, 780, 580, 710, 690, 740],
    'has_job': [1, 1, 1, 0, 1, 1, 1, 1, 0, 1],
    'approved': [0, 1, 1, 0, 1, 1, 0, 1, 0, 1]  # Target
}

df = pd.DataFrame(data)
print(df)
```

### How Would a Human Decide?

A loan officer might think:

```
Is credit score > 700?
├── Yes: Probably approve
│   └── Is income > 50000?
│       ├── Yes: APPROVE
│       └── No: Check job status...
└── No: Risky
    └── Is income > 100000?
        ├── Yes: Maybe approve
        └── No: REJECT
```

**A decision tree learns these rules from data!**

***

## Building a Decision Tree from Scratch

### The Key Question: How Do We Choose Splits?

At each step, we need to decide:

1. Which feature to split on?
2. What value to split at?

We want splits that **separate classes well**.

### Measuring "Purity" with Gini Impurity

Imagine reaching into a bag of colored balls. **Gini Impurity** measures: "If I pick two balls at random, how likely are they to be different colors?"

$$
Gini = 1 - \sum_{i=1}^{C} p_i^2
$$

Where $p_i$ is the proportion of class $i$.

* **Gini = 0**: Perfect purity -- every ball is the same color. The split created a "pure" group.
* **Gini = 0.5**: Maximum impurity for 2 classes -- a 50/50 mix. Picking two balls at random gives you different colors half the time.
* **The goal**: Each split should create child nodes with *lower* Gini than the parent -- we're sorting the balls into separate bags.

```python theme={null}
def gini_impurity(y):
    """
    Calculate Gini impurity of a set.
    """
    if len(y) == 0:
        return 0
    
    # Count each class
    _, counts = np.unique(y, return_counts=True)
    probabilities = counts / len(y)
    
    # Gini = 1 - sum(p^2)
    return 1 - np.sum(probabilities ** 2)

# Examples
pure = [1, 1, 1, 1]           # All same class
print(f"Pure:  {gini_impurity(pure):.3f}")    # 0.0

mixed = [1, 1, 0, 0]          # 50-50 split
print(f"Mixed: {gini_impurity(mixed):.3f}")   # 0.5

mostly_one = [1, 1, 1, 0]     # 75-25 split
print(f"Mostly one: {gini_impurity(mostly_one):.3f}")  # 0.375
```

### Information Gain

We want the split that reduces impurity the most:

$$
Gain = Gini_{parent} - \frac{n_{left}}{n_{total}} Gini_{left} - \frac{n_{right}}{n_{total}} Gini_{right}
$$

```python theme={null}
def information_gain(y, left_indices, right_indices):
    """
    Calculate information gain from a split.
    """
    parent_gini = gini_impurity(y)
    
    n = len(y)
    n_left = len(left_indices)
    n_right = len(right_indices)
    
    if n_left == 0 or n_right == 0:
        return 0
    
    # Weighted average of child impurities
    child_gini = (
        (n_left / n) * gini_impurity(y[left_indices]) +
        (n_right / n) * gini_impurity(y[right_indices])
    )
    
    return parent_gini - child_gini
```

***

## Building the Tree

```python theme={null}
class DecisionNode:
    """A node in the decision tree."""
    def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
        self.feature = feature        # Feature index to split on
        self.threshold = threshold    # Value to split at
        self.left = left              # Left subtree
        self.right = right            # Right subtree
        self.value = value            # Predicted class (for leaf nodes)

class SimpleDecisionTree:
    def __init__(self, max_depth=5, min_samples_split=2):
        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
        self.root = None
    
    def fit(self, X, y):
        self.root = self._build_tree(X, y, depth=0)
    
    def _build_tree(self, X, y, depth):
        n_samples, n_features = X.shape
        n_classes = len(np.unique(y))
        
        # Stopping conditions
        if (depth >= self.max_depth or 
            n_samples < self.min_samples_split or 
            n_classes == 1):
            return DecisionNode(value=self._most_common_class(y))
        
        # Find best split
        best_gain = 0
        best_feature = None
        best_threshold = None
        
        for feature in range(n_features):
            thresholds = np.unique(X[:, feature])
            
            for threshold in thresholds:
                left_mask = X[:, feature] <= threshold
                right_mask = ~left_mask
                
                if np.sum(left_mask) == 0 or np.sum(right_mask) == 0:
                    continue
                
                gain = information_gain(y, np.where(left_mask)[0], np.where(right_mask)[0])
                
                if gain > best_gain:
                    best_gain = gain
                    best_feature = feature
                    best_threshold = threshold
        
        # No good split found
        if best_gain == 0:
            return DecisionNode(value=self._most_common_class(y))
        
        # Create child nodes
        left_mask = X[:, best_feature] <= best_threshold
        left = self._build_tree(X[left_mask], y[left_mask], depth + 1)
        right = self._build_tree(X[~left_mask], y[~left_mask], depth + 1)
        
        return DecisionNode(
            feature=best_feature,
            threshold=best_threshold,
            left=left,
            right=right
        )
    
    def _most_common_class(self, y):
        values, counts = np.unique(y, return_counts=True)
        return values[np.argmax(counts)]
    
    def predict(self, X):
        return np.array([self._traverse(x, self.root) for x in X])
    
    def _traverse(self, x, node):
        if node.value is not None:
            return node.value
        
        if x[node.feature] <= node.threshold:
            return self._traverse(x, node.left)
        return self._traverse(x, node.right)

# Test our tree
X = df[['age', 'income', 'credit_score', 'has_job']].values
y = df['approved'].values

tree = SimpleDecisionTree(max_depth=3)
tree.fit(X, y)
predictions = tree.predict(X)
print("Predictions:", predictions)
print("Actual:     ", y)
print(f"Accuracy:    {np.mean(predictions == y):.2%}")
```

***

## Using scikit-learn

```python theme={null}
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Prepare data
X = df[['age', 'income', 'credit_score', 'has_job']].values
y = df['approved'].values
feature_names = ['age', 'income', 'credit_score', 'has_job']

# Train
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X, y)

# Visualize the tree!
plt.figure(figsize=(20, 10))
plot_tree(model, 
          feature_names=feature_names,
          class_names=['Rejected', 'Approved'],
          filled=True,
          rounded=True,
          fontsize=10)
plt.title("Loan Approval Decision Tree")
plt.tight_layout()
plt.show()
```

***

## Real Example: Iris Classification

```python theme={null}
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Train
model = DecisionTreeClassifier(max_depth=4, random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Feature importance
print("\nFeature Importances:")
for name, importance in zip(iris.feature_names, model.feature_importances_):
    print(f"  {name}: {importance:.3f}")
```

***

## The Problem: Overfitting

Decision trees can get too specific:

```python theme={null}
# Overfit tree - memorizes training data
overfit_model = DecisionTreeClassifier(max_depth=None, min_samples_leaf=1)
overfit_model.fit(X_train, y_train)

print(f"Training accuracy: {overfit_model.score(X_train, y_train):.2%}")  # 100%!
print(f"Test accuracy:     {overfit_model.score(X_test, y_test):.2%}")    # Lower

# Well-tuned tree - generalizes better
good_model = DecisionTreeClassifier(max_depth=4, min_samples_leaf=5)
good_model.fit(X_train, y_train)

print(f"\nTuned Training accuracy: {good_model.score(X_train, y_train):.2%}")
print(f"Tuned Test accuracy:     {good_model.score(X_test, y_test):.2%}")
```

<Warning>
  **Overfitting Signs:**

  * Training accuracy much higher than test accuracy (e.g., 100% train vs 70% test)
  * Very deep trees (many levels) -- a tree with depth=50 has likely memorized individual data points
  * Leaves with very few samples -- if a leaf has 1-2 samples, the tree is "remembering" specific examples instead of learning general patterns

  **Why trees overfit so easily**: An unrestricted tree can always achieve 100% training accuracy by creating one leaf per training sample. It's like answering a quiz by memorizing every question-answer pair -- you ace the practice test but fail any new question. This is why we always limit `max_depth` or `min_samples_leaf`.
</Warning>

***

## Controlling Tree Complexity

```python theme={null}
from sklearn.tree import DecisionTreeClassifier

# Hyperparameters to control overfitting -- think of these as
# "guardrails" that prevent the tree from getting too specific.
model = DecisionTreeClassifier(
    max_depth=5,           # Maximum depth of tree -- the single most important control
    min_samples_split=10,  # Min samples to split a node -- prevents splitting tiny groups
    min_samples_leaf=5,    # Min samples in a leaf -- ensures each prediction has enough evidence
    max_features='sqrt',   # Max features per split -- adds randomness, reduces correlation between trees (useful for ensembles)
    random_state=42
)
# Practical tip: Start with max_depth=5 and increase gradually
# while monitoring test accuracy. When test accuracy stops improving
# (or starts dropping), you've found your sweet spot.
```

***

## Regression Trees

Trees can also predict numbers!

```python theme={null}
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
import numpy as np

# Create data
np.random.seed(42)
X = np.random.uniform(0, 10, (100, 1))
y = np.sin(X).ravel() + np.random.normal(0, 0.2, 100)

# Train regression tree
model = DecisionTreeRegressor(max_depth=4)
model.fit(X, y)

# Predict on smooth range
X_test = np.linspace(0, 10, 200).reshape(-1, 1)
y_pred = model.predict(X_test)

# Plot
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.5, label='Data')
plt.plot(X_test, y_pred, 'r-', linewidth=2, label='Tree Prediction')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Decision Tree Regression')
plt.legend()
plt.show()
```

Notice the "staircase" pattern -- trees predict constant values in each region! This is a fundamental limitation of decision trees for regression: they can only output discrete values (the mean of training points in each leaf), creating a piecewise-constant approximation. For smooth, continuous relationships, you'll need many leaves (risking overfitting) or an ensemble of trees (like Gradient Boosting, Module 6) that sums many small staircases into a smooth curve.

***

## Advantages and Disadvantages

<CardGroup cols={2}>
  <Card title="Advantages" icon="circle-check">
    * Easy to understand and visualize
    * No feature scaling needed
    * Handles both numeric and categorical
    * Feature importance built-in
    * Fast predictions
  </Card>

  <Card title="Disadvantages" icon="circle-xmark">
    * Prone to overfitting
    * Unstable (small data changes = different tree)
    * Axis-aligned splits only
    * Not as accurate as ensemble methods
    * Can be biased toward features with many levels
  </Card>
</CardGroup>

***

## 🚀 Mini Projects

<CardGroup cols={2}>
  <Card title="Project 1" icon="credit-card" color="#3B82F6">
    Build a loan approval classifier
  </Card>

  <Card title="Project 2" icon="ship" color="#10B981">
    Titanic survival prediction
  </Card>

  <Card title="Project 3" icon="sitemap" color="#8B5CF6">
    Visualize and interpret decision rules
  </Card>
</CardGroup>

<details>
  <summary>**Project 1: Loan Approval Classifier** - Build a business decision system</summary>

  **Objective**: Create a decision tree to automatically approve/reject loan applications.

  ```python theme={null}
  import numpy as np
  import pandas as pd
  from sklearn.tree import DecisionTreeClassifier, plot_tree
  from sklearn.model_selection import train_test_split
  from sklearn.metrics import classification_report, confusion_matrix
  import matplotlib.pyplot as plt

  # Generate loan application data
  np.random.seed(42)
  n = 500

  age = np.random.normal(35, 10, n).clip(18, 70)
  income = np.random.exponential(50000, n) + 20000
  credit_score = np.random.normal(680, 80, n).clip(300, 850)
  debt_ratio = np.random.uniform(0.1, 0.6, n)
  employment_years = np.random.exponential(5, n).clip(0, 30)

  # Approval logic (what tree should learn)
  approved = (
      (credit_score > 650) & 
      (income > 40000) & 
      (debt_ratio < 0.4) |
      (credit_score > 750) & (income > 30000) |
      (employment_years > 5) & (credit_score > 600)
  ).astype(int)

  # Add some noise
  noise_idx = np.random.choice(n, size=int(n*0.05), replace=False)
  approved[noise_idx] = 1 - approved[noise_idx]

  df = pd.DataFrame({
      'age': age, 'income': income, 'credit_score': credit_score,
      'debt_ratio': debt_ratio, 'employment_years': employment_years,
      'approved': approved
  })

  print(f"Approval rate: {approved.mean():.1%}")

  # Split data
  X = df.drop('approved', axis=1)
  y = df['approved']
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  # Train decision tree
  tree = DecisionTreeClassifier(max_depth=4, random_state=42)
  tree.fit(X_train, y_train)

  # Evaluate
  y_pred = tree.predict(X_test)
  print("\n=== Classification Report ===")
  print(classification_report(y_test, y_pred, target_names=['Rejected', 'Approved']))

  # Feature importance
  print("\n=== Feature Importance ===")
  for name, imp in sorted(zip(X.columns, tree.feature_importances_), key=lambda x: -x[1]):
      print(f"{name}: {imp:.3f}")

  # Visualize tree
  plt.figure(figsize=(20, 10))
  plot_tree(tree, feature_names=X.columns.tolist(), class_names=['Reject', 'Approve'],
            filled=True, rounded=True, fontsize=10)
  plt.title('Loan Approval Decision Tree')
  plt.tight_layout()
  plt.savefig('loan_decision_tree.png', dpi=150)
  print("\nSaved loan_decision_tree.png")

  # Interpret rules
  print("\n=== Key Decision Rules ===")
  print("The tree learned these patterns:")
  print("1. Credit score is the most important factor")
  print("2. Income thresholds vary by credit score")
  print("3. Low debt ratio can compensate for moderate credit")
  ```
</details>

<details>
  <summary>**Project 2: Titanic Survival Prediction** - Classic ML challenge</summary>

  **Objective**: Predict passenger survival on the Titanic.

  ```python theme={null}
  import numpy as np
  import pandas as pd
  from sklearn.tree import DecisionTreeClassifier
  from sklearn.model_selection import train_test_split, cross_val_score
  from sklearn.metrics import accuracy_score

  # Simulated Titanic-like data
  np.random.seed(42)
  n = 800

  pclass = np.random.choice([1, 2, 3], n, p=[0.25, 0.25, 0.5])
  sex = np.random.choice([0, 1], n, p=[0.35, 0.65])  # 0=female, 1=male
  age = np.random.normal(30, 15, n).clip(1, 80)
  sibsp = np.random.poisson(0.5, n)
  fare = np.where(pclass == 1, np.random.exponential(80, n),
                 np.where(pclass == 2, np.random.exponential(20, n),
                         np.random.exponential(10, n)))

  # Survival probability (women & children first, first class priority)
  survival_prob = (
      0.1 +
      0.5 * (sex == 0) +  # Female
      0.2 * (age < 15) +  # Child
      0.15 * (pclass == 1) +
      0.05 * (pclass == 2) -
      0.1 * (age > 50)
  ).clip(0, 1)
  survived = (np.random.random(n) < survival_prob).astype(int)

  df = pd.DataFrame({
      'pclass': pclass, 'sex': sex, 'age': age,
      'sibsp': sibsp, 'fare': fare, 'survived': survived
  })

  print(f"Survival rate: {survived.mean():.1%}")

  # Prepare data
  X = df.drop('survived', axis=1)
  y = df['survived']
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  # Compare different tree depths
  print("\n=== Depth Comparison ===")
  for depth in [2, 4, 6, 8, None]:
      tree = DecisionTreeClassifier(max_depth=depth, random_state=42)
      cv_scores = cross_val_score(tree, X_train, y_train, cv=5)
      tree.fit(X_train, y_train)
      test_acc = accuracy_score(y_test, tree.predict(X_test))
      print(f"Depth {str(depth):>4}: CV={cv_scores.mean():.3f}±{cv_scores.std():.3f}, Test={test_acc:.3f}")

  # Best model
  best_tree = DecisionTreeClassifier(max_depth=4, random_state=42)
  best_tree.fit(X_train, y_train)

  print("\n=== Feature Importance ===")
  for name, imp in sorted(zip(X.columns, best_tree.feature_importances_), key=lambda x: -x[1]):
      if imp > 0.01:
          print(f"{name}: {imp:.3f}")

  # Predict for specific passengers
  print("\n=== Survival Predictions ===")
  passengers = [
      {"pclass": 1, "sex": 0, "age": 25, "sibsp": 1, "fare": 100},  # 1st class woman
      {"pclass": 3, "sex": 1, "age": 30, "sibsp": 0, "fare": 8},    # 3rd class man
      {"pclass": 2, "sex": 0, "age": 5, "sibsp": 2, "fare": 20},    # 2nd class child
  ]
  for p in passengers:
      X_new = pd.DataFrame([p])
      prob = best_tree.predict_proba(X_new)[0][1]
      pred = "Survived" if prob > 0.5 else "Died"
      print(f"Class {p['pclass']}, {'Female' if p['sex']==0 else 'Male'}, Age {p['age']:.0f}: {pred} ({prob:.0%})")
  ```
</details>

<details>
  <summary>**Project 3: Rule Extraction & Interpretation** - Understand model decisions</summary>

  **Objective**: Extract and visualize decision rules from a trained tree.

  ```python theme={null}
  import numpy as np
  from sklearn.tree import DecisionTreeClassifier, export_text
  from sklearn.datasets import load_iris

  # Load iris for demonstration
  iris = load_iris()
  X, y = iris.data, iris.target
  feature_names = iris.feature_names
  class_names = iris.target_names

  # Train a shallow, interpretable tree
  tree = DecisionTreeClassifier(max_depth=3, random_state=42)
  tree.fit(X, y)

  # Export text rules
  print("=== Decision Rules (Text) ===")
  rules = export_text(tree, feature_names=feature_names)
  print(rules)

  # Extract rules programmatically
  def extract_rules(tree, feature_names, class_names):
      """Extract decision rules from tree."""
      tree_ = tree.tree_
      feature_name = [
          feature_names[i] if i != -2 else "undefined"
          for i in tree_.feature
      ]
      
      rules = []
      
      def recurse(node, path):
          if tree_.feature[node] != -2:  # Not a leaf
              name = feature_name[node]
              threshold = tree_.threshold[node]
              
              # Left branch (<=)
              recurse(tree_.children_left[node], path + [f"{name} <= {threshold:.2f}"])
              # Right branch (>)
              recurse(tree_.children_right[node], path + [f"{name} > {threshold:.2f}"])
          else:  # Leaf
              class_idx = np.argmax(tree_.value[node])
              class_name = class_names[class_idx]
              samples = int(tree_.n_node_samples[node])
              rules.append({
                  'conditions': path,
                  'class': class_name,
                  'samples': samples,
                  'confidence': tree_.value[node].max() / tree_.value[node].sum()
              })
      
      recurse(0, [])
      return rules

  rules = extract_rules(tree, feature_names, list(class_names))

  print("\n=== Human-Readable Rules ===")
  for i, rule in enumerate(rules, 1):
      conditions = " AND ".join(rule['conditions'])
      print(f"Rule {i}: IF {conditions}")
      print(f"         THEN {rule['class']} (samples={rule['samples']}, confidence={rule['confidence']:.0%})\n")

  # Simplify to actionable insights
  print("=== Actionable Insights ===")
  print("1. Petal length ≤ 2.45 → Always Setosa (100% accuracy)")
  print("2. Petal length > 2.45 AND petal width ≤ 1.75 → Likely Versicolor")
  print("3. Petal length > 2.45 AND petal width > 1.75 → Likely Virginica")
  print("\nKey discriminating features: petal_length, petal_width")
  ```
</details>

***

## Key Takeaways

<CardGroup cols={2}>
  <Card title="If-Then Rules" icon="code-branch">
    Trees learn rules from data automatically
  </Card>

  <Card title="Gini Impurity" icon="scale-balanced">
    Measures how mixed a group is (lower = purer)
  </Card>

  <Card title="Information Gain" icon="chart-line">
    Choose splits that reduce impurity most
  </Card>

  <Card title="Depth Control" icon="ruler-vertical">
    Limit depth to prevent overfitting
  </Card>
</CardGroup>

***

## What's Next?

Before moving to ensemble methods, let's explore two more powerful classifiers:

<CardGroup cols={2}>
  <Card title="Continue to SVM" icon="arrow-right" href="/courses/ml-mastery/05a-svm">
    Support Vector Machines - find the optimal boundary between classes
  </Card>

  <Card title="Skip to Ensembles" icon="layer-group" href="/courses/ml-mastery/06-ensemble-methods">
    Or jump ahead to Random Forests and Gradient Boosting
  </Card>
</CardGroup>
