K-Nearest Neighbors (KNN)
The Most Intuitive Algorithm
Imagine you move to a new city and want to know if a neighborhood is safe. What do you do? You look at nearby neighborhoods. If 4 out of 5 nearby neighborhoods are safe → probably safe. If 4 out of 5 nearby neighborhoods are unsafe → probably unsafe. That’s KNN. To predict something, find the K most similar examples and use their labels.The Movie Recommendation Problem
You just watched “The Matrix” and loved it. What should you watch next?Finding Similar Movies
The KNN Algorithm
For Classification
For Regression
Instead of voting, average the values:Real Example: Iris Classification
Choosing K: The Magic Number
K=1: Use only the closest neighbor- Very sensitive to noise
- Can overfit
- Smoother predictions
- Can underfit
The Scaling Problem
KNN uses distance. If features have different scales, large-scale features dominate:Distance Metrics
Euclidean isn’t the only option:| Metric | Formula | Best For |
|---|---|---|
| Euclidean | Most cases, continuous features | |
| Manhattan | Grid-like movement, high dimensions | |
| Chebyshev | When max difference matters | |
| Cosine | Text, when magnitude doesn’t matter |
Math Connection: Distance metrics come from linear algebra concepts. See Vectors for more on measuring similarity.
Weighted KNN
Not all neighbors are equal! Closer neighbors should have more influence:Pros and Cons
Advantages
- Simple and intuitive
- No training phase (lazy learner)
- Works with any number of classes
- Naturally handles multi-label
- Non-parametric (no assumptions about data)
Disadvantages
- Slow prediction (checks all training data)
- Sensitive to irrelevant features
- Sensitive to feature scaling
- Struggles in high dimensions (curse of dimensionality)
- Memory intensive (stores all data)
When to Use KNN
Good for:- Small to medium datasets
- When you need interpretability (“these are the similar cases”)
- Recommendation systems
- Baseline model to beat
- Large datasets (slow prediction)
- High-dimensional data (100+ features)
- When fast prediction is critical
Key Takeaways
Find Neighbors
Predict based on the K most similar examples
Vote or Average
Classification = majority vote, Regression = average
Scale Features
Distance-based algorithms need scaled data
Choose K Wisely
Use cross-validation to find the best K
What’s Next?
Now that you understand classification with both logistic regression and KNN, let’s learn about decision trees - a completely different approach!Continue to Module 5: Decision Trees
Learn how trees make decisions - just like you do