Decision Trees
You Already Think in Trees
The Loan Approval Problem
How Would a Human Decide?
Building a Decision Tree from Scratch
The Key Question: How Do We Choose Splits?
Measuring “Purity” with Gini Impurity
Information Gain
Building the Tree
Using scikit-learn
Real Example: Iris Classification
The Problem: Overfitting
Controlling Tree Complexity
Regression Trees
Advantages and Disadvantages
🚀 Mini Projects
Key Takeaways
What’s Next?

Decision Trees

You Already Think in Trees

Every day, you make decisions using “if-then” logic: Should I bring an umbrella?

Is it raining? → Yes → Bring umbrella
Is it cloudy? → Yes → Bring umbrella
Is forecast rain > 50%? → Yes → Bring umbrella
Otherwise → Don’t bring umbrella

That’s a decision tree!

The Loan Approval Problem

Imagine you’re a bank deciding whether to approve loans:

import numpy as np
import pandas as pd

# Loan applicant data
data = {
    'age': [25, 35, 45, 20, 35, 52, 23, 40, 60, 48],
    'income': [50000, 120000, 80000, 20000, 60000, 150000, 30000, 95000, 70000, 85000],
    'credit_score': [650, 750, 680, 500, 720, 780, 580, 710, 690, 740],
    'has_job': [1, 1, 1, 0, 1, 1, 1, 1, 0, 1],
    'approved': [0, 1, 1, 0, 1, 1, 0, 1, 0, 1]  # Target
}

df = pd.DataFrame(data)
print(df)

How Would a Human Decide?

A loan officer might think:

Is credit score > 700?
├── Yes: Probably approve
│   └── Is income > 50000?
│       ├── Yes: APPROVE
│       └── No: Check job status...
└── No: Risky
    └── Is income > 100000?
        ├── Yes: Maybe approve
        └── No: REJECT

A decision tree learns these rules from data!

Building a Decision Tree from Scratch

The Key Question: How Do We Choose Splits?

At each step, we need to decide:

Which feature to split on?
What value to split at?

We want splits that separate classes well.

Measuring “Purity” with Gini Impurity

Gini Impurity measures how mixed a group is:

Gini = 1 - \sum_{i=1}^{C} p_i^2

Where

p_i

is the proportion of class

i

def gini_impurity(y):
    """
    Calculate Gini impurity of a set.
    """
    if len(y) == 0:
        return 0
    
    # Count each class
    _, counts = np.unique(y, return_counts=True)
    probabilities = counts / len(y)
    
    # Gini = 1 - sum(p^2)
    return 1 - np.sum(probabilities ** 2)

# Examples
pure = [1, 1, 1, 1]           # All same class
print(f"Pure:  {gini_impurity(pure):.3f}")    # 0.0

mixed = [1, 1, 0, 0]          # 50-50 split
print(f"Mixed: {gini_impurity(mixed):.3f}")   # 0.5

mostly_one = [1, 1, 1, 0]     # 75-25 split
print(f"Mostly one: {gini_impurity(mostly_one):.3f}")  # 0.375

Information Gain

We want the split that reduces impurity the most:

Gain = Gini_{parent} - \frac{n_{left}}{n_{total}} Gini_{left} - \frac{n_{right}}{n_{total}} Gini_{right}

def information_gain(y, left_indices, right_indices):
    """
    Calculate information gain from a split.
    """
    parent_gini = gini_impurity(y)
    
    n = len(y)
    n_left = len(left_indices)
    n_right = len(right_indices)
    
    if n_left == 0 or n_right == 0:
        return 0
    
    # Weighted average of child impurities
    child_gini = (
        (n_left / n) * gini_impurity(y[left_indices]) +
        (n_right / n) * gini_impurity(y[right_indices])
    )
    
    return parent_gini - child_gini

Building the Tree

class DecisionNode:
    """A node in the decision tree."""
    def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
        self.feature = feature        # Feature index to split on
        self.threshold = threshold    # Value to split at
        self.left = left              # Left subtree
        self.right = right            # Right subtree
        self.value = value            # Predicted class (for leaf nodes)

class SimpleDecisionTree:
    def __init__(self, max_depth=5, min_samples_split=2):
        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
        self.root = None
    
    def fit(self, X, y):
        self.root = self._build_tree(X, y, depth=0)
    
    def _build_tree(self, X, y, depth):
        n_samples, n_features = X.shape
        n_classes = len(np.unique(y))
        
        # Stopping conditions
        if (depth >= self.max_depth or 
            n_samples < self.min_samples_split or 
            n_classes == 1):
            return DecisionNode(value=self._most_common_class(y))
        
        # Find best split
        best_gain = 0
        best_feature = None
        best_threshold = None
        
        for feature in range(n_features):
            thresholds = np.unique(X[:, feature])
            
            for threshold in thresholds:
                left_mask = X[:, feature] <= threshold
                right_mask = ~left_mask
                
                if np.sum(left_mask) == 0 or np.sum(right_mask) == 0:
                    continue
                
                gain = information_gain(y, np.where(left_mask)[0], np.where(right_mask)[0])
                
                if gain > best_gain:
                    best_gain = gain
                    best_feature = feature
                    best_threshold = threshold
        
        # No good split found
        if best_gain == 0:
            return DecisionNode(value=self._most_common_class(y))
        
        # Create child nodes
        left_mask = X[:, best_feature] <= best_threshold
        left = self._build_tree(X[left_mask], y[left_mask], depth + 1)
        right = self._build_tree(X[~left_mask], y[~left_mask], depth + 1)
        
        return DecisionNode(
            feature=best_feature,
            threshold=best_threshold,
            left=left,
            right=right
        )
    
    def _most_common_class(self, y):
        values, counts = np.unique(y, return_counts=True)
        return values[np.argmax(counts)]
    
    def predict(self, X):
        return np.array([self._traverse(x, self.root) for x in X])
    
    def _traverse(self, x, node):
        if node.value is not None:
            return node.value
        
        if x[node.feature] <= node.threshold:
            return self._traverse(x, node.left)
        return self._traverse(x, node.right)

# Test our tree
X = df[['age', 'income', 'credit_score', 'has_job']].values
y = df['approved'].values

tree = SimpleDecisionTree(max_depth=3)
tree.fit(X, y)
predictions = tree.predict(X)
print("Predictions:", predictions)
print("Actual:     ", y)
print(f"Accuracy:    {np.mean(predictions == y):.2%}")

Using scikit-learn

from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Prepare data
X = df[['age', 'income', 'credit_score', 'has_job']].values
y = df['approved'].values
feature_names = ['age', 'income', 'credit_score', 'has_job']

# Train
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X, y)

# Visualize the tree!
plt.figure(figsize=(20, 10))
plot_tree(model, 
          feature_names=feature_names,
          class_names=['Rejected', 'Approved'],
          filled=True,
          rounded=True,
          fontsize=10)
plt.title("Loan Approval Decision Tree")
plt.tight_layout()
plt.show()

Real Example: Iris Classification

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Train
model = DecisionTreeClassifier(max_depth=4, random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Feature importance
print("\nFeature Importances:")
for name, importance in zip(iris.feature_names, model.feature_importances_):
    print(f"  {name}: {importance:.3f}")

The Problem: Overfitting

Decision trees can get too specific:

# Overfit tree - memorizes training data
overfit_model = DecisionTreeClassifier(max_depth=None, min_samples_leaf=1)
overfit_model.fit(X_train, y_train)

print(f"Training accuracy: {overfit_model.score(X_train, y_train):.2%}")  # 100%!
print(f"Test accuracy:     {overfit_model.score(X_test, y_test):.2%}")    # Lower

# Well-tuned tree - generalizes better
good_model = DecisionTreeClassifier(max_depth=4, min_samples_leaf=5)
good_model.fit(X_train, y_train)

print(f"\nTuned Training accuracy: {good_model.score(X_train, y_train):.2%}")
print(f"Tuned Test accuracy:     {good_model.score(X_test, y_test):.2%}")

Overfitting Signs:

Training accuracy much higher than test accuracy
Very deep trees (many levels)
Leaves with very few samples

Controlling Tree Complexity

from sklearn.tree import DecisionTreeClassifier

# Hyperparameters to control overfitting
model = DecisionTreeClassifier(
    max_depth=5,           # Maximum depth of tree
    min_samples_split=10,  # Min samples to split a node
    min_samples_leaf=5,    # Min samples in a leaf
    max_features='sqrt',   # Max features to consider per split
    random_state=42
)

Regression Trees

Trees can also predict numbers!

from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
import numpy as np

# Create data
np.random.seed(42)
X = np.random.uniform(0, 10, (100, 1))
y = np.sin(X).ravel() + np.random.normal(0, 0.2, 100)

# Train regression tree
model = DecisionTreeRegressor(max_depth=4)
model.fit(X, y)

# Predict on smooth range
X_test = np.linspace(0, 10, 200).reshape(-1, 1)
y_pred = model.predict(X_test)

# Plot
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.5, label='Data')
plt.plot(X_test, y_pred, 'r-', linewidth=2, label='Tree Prediction')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Decision Tree Regression')
plt.legend()
plt.show()

Notice the “staircase” pattern - trees predict constant values in each region!

Advantages and Disadvantages

Advantages

Easy to understand and visualize
No feature scaling needed
Handles both numeric and categorical
Feature importance built-in
Fast predictions

Disadvantages

Prone to overfitting
Unstable (small data changes = different tree)
Axis-aligned splits only
Not as accurate as ensemble methods
Can be biased toward features with many levels

🚀 Mini Projects

Project 1

Build a loan approval classifier

Project 2

Titanic survival prediction

Project 3

Visualize and interpret decision rules

Key Takeaways

If-Then Rules

Trees learn rules from data automatically

Gini Impurity

Measures how mixed a group is (lower = purer)

Information Gain

Choose splits that reduce impurity most

Depth Control

Limit depth to prevent overfitting

What’s Next?

Before moving to ensemble methods, let’s explore two more powerful classifiers:

Continue to SVM

Support Vector Machines - find the optimal boundary between classes

Skip to Ensembles

Or jump ahead to Random Forests and Gradient Boosting

K-Nearest Neighbors SVM

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Decision Trees

​You Already Think in Trees

​The Loan Approval Problem

​How Would a Human Decide?

​Building a Decision Tree from Scratch

​The Key Question: How Do We Choose Splits?

​Measuring “Purity” with Gini Impurity

​Information Gain

​Building the Tree

​Using scikit-learn

​Real Example: Iris Classification

​The Problem: Overfitting

​Controlling Tree Complexity

​Regression Trees

​Advantages and Disadvantages

Advantages

Disadvantages

​🚀 Mini Projects

Project 1

Project 2

Project 3

​Key Takeaways

If-Then Rules

Gini Impurity

Information Gain

Depth Control

​What’s Next?

Continue to SVM

Skip to Ensembles

Decision Trees

You Already Think in Trees

The Loan Approval Problem

How Would a Human Decide?

Building a Decision Tree from Scratch

The Key Question: How Do We Choose Splits?

Measuring “Purity” with Gini Impurity

Information Gain

Building the Tree

Using scikit-learn

Real Example: Iris Classification

The Problem: Overfitting

Controlling Tree Complexity

Regression Trees

Advantages and Disadvantages

🚀 Mini Projects

Key Takeaways

What’s Next?