Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Classification
A Different Kind of Prediction
In regression, we predict numbers: “This house costs $450,000” In classification, we predict categories: “This email is SPAM” Real-world classification problems:- Is this transaction fraudulent? (Yes/No)
- What digit is in this image? (0-9)
- Will this customer buy? (Yes/No)
- What disease does this patient have? (A, B, C, D)
- Is this review positive or negative? (Positive/Negative)
The Email Spam Problem
Let’s build a spam detector from scratch.The Data
Imagine each email is represented by features:- Number of exclamation marks
- Contains word “FREE”
- Contains word “WINNER”
- Sender in contacts
- Length of email
Why Not Just Use Linear Regression?
Let’s try:- Predictions can be > 1 or < 0 (what does 1.12 “spam” mean?)
- We want probabilities (0 to 1), not arbitrary numbers
- We want a clear decision: spam or not spam
The Sigmoid Function: Squashing to Probabilities
We need a function that:- Takes any number (from -∞ to +∞)
- Outputs a value between 0 and 1
- Acts like a probability
Logistic Regression
Combine linear regression with sigmoid:- Compute a weighted sum (like linear regression)
- Pass through sigmoid to get a probability
- If probability > 0.5, predict “spam”
Training Logistic Regression
The Loss Function
For classification, we use Binary Cross-Entropy (log loss): Why not use MSE like in regression? Because MSE creates a loss surface with many flat plateaus for classification, making gradient descent painfully slow. Cross-entropy has steep slopes that push the model to fix its confident-but-wrong predictions aggressively. Intuition — think of it as a “surprise” score:- If actual is 1 and we predict 0.9 — small loss (not surprised, good prediction!)
- If actual is 1 and we predict 0.1 — large loss (very surprised, terrible prediction!)
- If actual is 1 and we predict 0.001 — enormous loss (the log function explodes as predictions approach 0, heavily penalizing confident wrong answers)
Gradient Descent for Logistic Regression
Using scikit-learn
Real Example: Breast Cancer Detection
Understanding the Confusion Matrix
- True Positive (TP): Predicted spam, was spam
- True Negative (TN): Predicted not spam, was not spam
- False Positive (FP): Predicted spam, was not spam (annoying!)
- False Negative (FN): Predicted not spam, was spam (dangerous!)
Key Metrics
When to prioritize which metric?Think of it as a cost-of-mistakes analysis:
- High Precision needed: Spam filter — if you mark a real email as spam, your user misses an important message. The cost of a false positive is high.
- High Recall needed: Disease detection — if you miss a sick patient and send them home, the consequences could be fatal. The cost of a false negative is high.
- F1 Score: When you need balance between both, or when you’re not sure which type of mistake is worse. F1 is the harmonic mean, which means it punishes you if either precision or recall is low.
Multi-Class Classification
What if there are more than 2 classes?The Decision Boundary
Logistic regression creates a linear decision boundary:Key Takeaways
Classification = Categories
Predict discrete labels, not numbers
Sigmoid = Probability
Squash outputs to 0-1 range
Threshold = Decision
P > 0.5 means positive class
Metrics Matter
Accuracy isn’t always enough
🚀 Mini Projects
Project 1
Build a spam detector from scratch
Project 2
Medical diagnosis classifier with metrics analysis
Project 3
Customer churn prediction system
What’s Next?
Before moving to more complex algorithms, let’s learn K-Nearest Neighbors - an even more intuitive approach to classification!Continue to Module 4a: K-Nearest Neighbors
Classify by finding similar examples - the simplest ML algorithm