Ensemble Methods
The Wisdom of Crowds
Question: Who’s smarter - one expert or 100 average people? Surprisingly: The crowd often wins!A Real Experiment
In 1906, statistician Francis Galton visited a county fair. 787 people guessed the weight of an ox:- Individual guesses ranged wildly
- Average of all guesses: 1,197 pounds
- Actual weight: 1,198 pounds
Many weak learners combined can outperform a single strong learner
Why Ensembles Work
Imagine 5 decision trees, each 70% accurate:The Math: For majority voting with independent 70% accurate models:
Bagging: Bootstrap Aggregating
Idea: Train multiple models on different random samples of data.How Bagging Works
- Create N random samples (with replacement) from training data
- Train a model on each sample
- Average predictions (regression) or vote (classification)
Random Forest: Bagging + Feature Randomness
Random Forest = Bagging + Random Feature Selection At each split, only consider a random subset of features! This makes trees more diverse, improving ensemble performance.Feature Importance
Random Forests tell you which features matter most:Boosting: Learning from Mistakes
Key Idea: Train models sequentially, each focusing on what previous models got wrong.AdaBoost (Adaptive Boosting)
- Train a model
- Increase weights of misclassified samples
- Train next model (focuses on hard examples)
- Combine all models with weighted voting
Gradient Boosting
Instead of reweighting samples, fit each tree to the residual errors:XGBoost: The Competition Winner
XGBoost (Extreme Gradient Boosting) is often the best choice for tabular data.Why XGBoost Wins
- Regularization: Built-in L1/L2 regularization
- Parallel training: Uses all CPU cores
- Missing values: Handles them automatically
- Optimized: Carefully engineered for speed
Comparison: When to Use What?
Bagging vs Boosting
Bagging (Random Forest)
- Train in parallel
- Reduce variance (overfitting)
- Works with high-variance models
- More robust to outliers
- Harder to overfit
Boosting (XGBoost)
- Train sequentially
- Reduce bias (underfitting)
- Learns from mistakes
- Usually more accurate
- Can overfit if not tuned
Hyperparameter Tuning
Voting Classifier: Mix Different Models
Combine different types of models:Stacking: Models Learn from Models
Train a meta-model on the predictions of base models:🚀 Mini Projects
Project 1
Build and tune a Random Forest classifier
Project 2
Gradient Boosting for regression
Project 3
Ensemble comparison on real dataset
Key Takeaways
Crowd Wisdom
Many weak models beat one strong model
Bagging = Parallel
Train on different data samples
Boosting = Sequential
Each model fixes previous mistakes
Random Forest
Best starting point for tabular data
What’s Next?
Now that you understand the main ML algorithms, let’s learn how to properly evaluate and compare models!Continue to Module 7: Model Evaluation
Learn cross-validation, metrics, and how to avoid common mistakes