Naive Bayes
The Probability Perspective
Most algorithms we’ve seen ask: “Which side of the boundary is this point on?” Naive Bayes asks: “Given the evidence, what’s the probability of each class?”The Doctor’s Diagnosis Problem
A patient walks in with symptoms:- Fever: Yes
- Cough: Yes
- Fatigue: Yes
Bayes’ Theorem
In English:- P(Disease|Symptoms): Probability of disease given symptoms (what we want)
- P(Symptoms|Disease): How likely these symptoms are if you have the disease
- P(Disease): How common the disease is (prior probability)
- P(Symptoms): How common these symptoms are overall
Math Connection: This is Bayes’ Theorem from probability theory. See Probability for the full derivation.
Why “Naive”?
The “naive” assumption: All features are independent given the class. For our flu example:- P(Fever AND Cough AND Fatigue | Flu)
- ≈ P(Fever|Flu) × P(Cough|Flu) × P(Fatigue|Flu)
Building Naive Bayes From Scratch
Types of Naive Bayes
1. Gaussian Naive Bayes
For continuous features (assumes normal distribution):2. Multinomial Naive Bayes
For count data (word frequencies, document classification):3. Bernoulli Naive Bayes
For binary features (word presence/absence):Real Example: Spam Classification
When Naive Bayes Shines
1. Text Classification
2. Fast Baseline Model
Laplace Smoothing
What if a word never appeared in training for a class?Naive Bayes vs Other Algorithms
| Aspect | Naive Bayes | Logistic Regression | Random Forest |
|---|---|---|---|
| Speed | Very Fast | Fast | Slow |
| Training data needed | Little | Moderate | Lots |
| Handles text | Excellent | Good | Poor |
| Feature independence | Required | Not required | Not required |
| Interpretability | Good | Good | Poor |
| Probability calibration | Often poor | Good | Moderate |
Probability Calibration
Naive Bayes probabilities are often overconfident:Key Takeaways
Probability-Based
Predicts class probabilities using Bayes’ theorem
Independence Assumption
Assumes features are independent (often wrong, still works!)
Fast & Simple
Trains instantly, great for baselines
Text Champion
Excels at document classification and spam filtering
What’s Next?
Now let’s learn about ensemble methods - combining multiple models for better predictions!Continue to Module 6: Ensemble Methods
The wisdom of crowds - Random Forests and Gradient Boosting