Decision Trees
You Already Think in Trees
Every day, you make decisions using “if-then” logic: Should I bring an umbrella?- Is it raining? → Yes → Bring umbrella
- Is it cloudy? → Yes → Bring umbrella
- Is forecast rain > 50%? → Yes → Bring umbrella
- Otherwise → Don’t bring umbrella
The Loan Approval Problem
Imagine you’re a bank deciding whether to approve loans:How Would a Human Decide?
A loan officer might think:Building a Decision Tree from Scratch
The Key Question: How Do We Choose Splits?
At each step, we need to decide:- Which feature to split on?
- What value to split at?
Measuring “Purity” with Gini Impurity
Gini Impurity measures how mixed a group is: Where is the proportion of class .Information Gain
We want the split that reduces impurity the most:Building the Tree
Using scikit-learn
Real Example: Iris Classification
The Problem: Overfitting
Decision trees can get too specific:Controlling Tree Complexity
Regression Trees
Trees can also predict numbers!Advantages and Disadvantages
Advantages
- Easy to understand and visualize
- No feature scaling needed
- Handles both numeric and categorical
- Feature importance built-in
- Fast predictions
Disadvantages
- Prone to overfitting
- Unstable (small data changes = different tree)
- Axis-aligned splits only
- Not as accurate as ensemble methods
- Can be biased toward features with many levels
🚀 Mini Projects
Project 1
Build a loan approval classifier
Project 2
Titanic survival prediction
Project 3
Visualize and interpret decision rules
Key Takeaways
If-Then Rules
Trees learn rules from data automatically
Gini Impurity
Measures how mixed a group is (lower = purer)
Information Gain
Choose splits that reduce impurity most
Depth Control
Limit depth to prevent overfitting