K-Nearest Neighbors (KNN)
The Most Intuitive Algorithm
Imagine you move to a new city and want to know if a neighborhood is safe. What do you do? You look at nearby neighborhoods. If 4 out of 5 nearby neighborhoods are safe → probably safe. If 4 out of 5 nearby neighborhoods are unsafe → probably unsafe. That’s KNN. To predict something, find the K most similar examples and use their labels. KNN is often called a “lazy learner” — not because it’s poorly designed, but because it does zero work during training. It just memorizes all the data and waits. All the computation happens at prediction time, when it searches for neighbors. This is the opposite of models like linear regression, which do all their work upfront during training and then predict instantly.The Movie Recommendation Problem
You just watched “The Matrix” and loved it. What should you watch next?Finding Similar Movies
The KNN Algorithm
For Classification
For Regression
Instead of voting, average the values:Real Example: Iris Classification
Choosing K: The Magic Number
K=1: Use only the closest neighbor- Very sensitive to noise
- Can overfit
- Smoother predictions
- Can underfit
The Scaling Problem
KNN uses distance. If features have different scales, large-scale features dominate:Distance Metrics
Euclidean isn’t the only option:| Metric | Formula | Best For |
|---|---|---|
| Euclidean | Most cases, continuous features | |
| Manhattan | Grid-like movement, high dimensions | |
| Chebyshev | When max difference matters | |
| Cosine | Text, when magnitude doesn’t matter |
Math Connection: Distance metrics come from linear algebra concepts. See Vectors for more on measuring similarity.
Weighted KNN
Not all neighbors are equal! Closer neighbors should have more influence:Pros and Cons
Advantages
- Simple and intuitive
- No training phase (lazy learner)
- Works with any number of classes
- Naturally handles multi-label
- Non-parametric (no assumptions about data)
Disadvantages
- Slow prediction (checks all training data)
- Sensitive to irrelevant features
- Sensitive to feature scaling
- Struggles in high dimensions (curse of dimensionality)
- Memory intensive (stores all data)
When to Use KNN
Good for:- Small to medium datasets
- When you need interpretability (“these are the similar cases”)
- Recommendation systems
- Baseline model to beat
- Large datasets (slow prediction — it must scan every training point for each query)
- High-dimensional data (100+ features) — the “curse of dimensionality” makes all points roughly equidistant, destroying the notion of “nearest”
- When fast prediction is critical (consider tree-based models instead)
- When you need to explain why the model made a decision (KNN says “these neighbors voted” but not what feature patterns drive the prediction)
Key Takeaways
Find Neighbors
Predict based on the K most similar examples
Vote or Average
Classification = majority vote, Regression = average
Scale Features
Distance-based algorithms need scaled data
Choose K Wisely
Use cross-validation to find the best K
What’s Next?
Now that you understand classification with both logistic regression and KNN, let’s learn about decision trees - a completely different approach!Continue to Module 5: Decision Trees
Learn how trees make decisions - just like you do