Common ML Mistakes
The ML Hall of Shame
Every data scientist has made these mistakes. Learn from them so you don’t have to!Mistake 1: Training on the Test Set
- ❌ Wrong
- ✅ Correct
Mistake 2: Using Accuracy for Imbalanced Data
- ❌ Wrong
- ✅ Correct
Mistake 3: Random Split for Time Series
- ❌ Wrong
- ✅ Correct
Mistake 4: Ignoring Feature Scaling
- ❌ Wrong
- ✅ Correct
Mistake 5: Feature Leakage from Target
- ❌ Wrong
- ✅ Correct
Mistake 6: Dropping Missing Values Carelessly
- ❌ Wrong
- ✅ Correct
Mistake 7: Overfitting to Validation Set
- ❌ Wrong
- ✅ Correct
Mistake 8: Not Checking for Data Drift
- ❌ Wrong
- ✅ Correct
Mistake 9: One-Hot Encoding High Cardinality
- ❌ Wrong
- ✅ Correct
Mistake 10: Ignoring Class Imbalance in CV
- ❌ Wrong
- ✅ Correct
Mistake 11: Not Setting Random Seeds
- ❌ Wrong
- ✅ Correct
Mistake 12: Selecting Features After Train-Test Split
- ❌ Wrong
- ✅ Correct
Mistake 13: Using Mean for Skewed Data
- ❌ Wrong
- ✅ Correct
Mistake 14: Trusting Default Hyperparameters
- ❌ Wrong
- ✅ Correct
Mistake 15: Complex Model Without Baseline
- ❌ Wrong
- ✅ Correct
Quick Reference Checklist
Before Training
- Split data before any preprocessing
- Set random seeds for reproducibility
- Check class balance
- Handle missing values appropriately
- Scale features if needed by algorithm
During Training
- Use pipelines to prevent leakage
- Use stratified CV for imbalanced data
- Use temporal splits for time series
- Compare to baseline models
- Tune hyperparameters systematically
After Training
- Evaluate on held-out test set
- Use appropriate metrics (not just accuracy)
- Check for overfitting (train vs test gap)
- Validate feature importance makes sense
- Document everything
In Production
- Monitor for data drift
- Track prediction distributions
- Set up alerts for performance degradation
- Plan for model retraining
Key Takeaways
Split First
Always separate test data before any processing
Use Pipelines
Prevent leakage with sklearn pipelines
Right Metrics
Match metrics to your problem
Start Simple
Baseline first, complexity later
Congratulations! 🎉
You’ve completed the ML Mastery course! You now have comprehensive knowledge of:- ML fundamentals and algorithms
- Feature engineering and data preprocessing
- Model evaluation and selection
- Advanced topics (time series, deep learning, deployment)
- Professional practices (pipelines, explainability, common mistakes)