Classification
A Different Kind of Prediction
The Email Spam Problem
The Data
Why Not Just Use Linear Regression?
The Sigmoid Function: Squashing to Probabilities
Logistic Regression
Training Logistic Regression
The Loss Function
Gradient Descent for Logistic Regression
Using scikit-learn
Real Example: Breast Cancer Detection
Understanding the Confusion Matrix
Key Metrics
Multi-Class Classification
The Decision Boundary
Key Takeaways
🚀 Mini Projects
What’s Next?

Classification

A Different Kind of Prediction

In regression, we predict numbers: “This house costs $450,000” In classification, we predict categories: “This email is SPAM” Real-world classification problems:

Is this transaction fraudulent? (Yes/No)
What digit is in this image? (0-9)
Will this customer buy? (Yes/No)
What disease does this patient have? (A, B, C, D)
Is this review positive or negative? (Positive/Negative)

The Email Spam Problem

Let’s build a spam detector from scratch.

The Data

Imagine each email is represented by features:

Number of exclamation marks
Contains word “FREE”
Contains word “WINNER”
Sender in contacts
Length of email

import numpy as np

# Email features: [exclamation_count, has_free, has_winner, in_contacts, length_bucket]
# Labels: 0 = not spam, 1 = spam

emails = np.array([
    [5, 1, 1, 0, 1],   # Short, has FREE and WINNER, lots of !!! -> likely spam
    [0, 0, 0, 1, 3],   # Long, from contact, no sketchy words -> not spam
    [3, 1, 0, 0, 1],   # Has FREE, some !!! -> maybe spam
    [0, 0, 0, 1, 2],   # From contact -> not spam
    [10, 1, 1, 0, 1],  # Very spammy
    [1, 0, 0, 1, 3],   # Normal email from contact
    [8, 1, 1, 0, 1],   # Spammy
    [0, 0, 0, 0, 2],   # Normal email
])

labels = np.array([1, 0, 1, 0, 1, 0, 1, 0])  # 1=spam, 0=not spam

Why Not Just Use Linear Regression?

Let’s try:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(emails, labels)

# Predict
predictions = model.predict(emails)
print("Predictions:", predictions)
# Output: [0.89, 0.12, 0.67, 0.15, 1.12, 0.18, 0.95, 0.22]

Problems:

Predictions can be > 1 or < 0 (what does 1.12 “spam” mean?)
We want probabilities (0 to 1), not arbitrary numbers
We want a clear decision: spam or not spam

The Sigmoid Function: Squashing to Probabilities

We need a function that:

Takes any number (from -∞ to +∞)
Outputs a value between 0 and 1
Acts like a probability

Enter the sigmoid function:

\sigma(z) = \frac{1}{1 + e^{-z}}

def sigmoid(z):
    """Squash any number to range (0, 1)"""
    return 1 / (1 + np.exp(-z))

# Test it
for z in [-10, -2, 0, 2, 10]:
    print(f"sigmoid({z:3d}) = {sigmoid(z):.4f}")

Output:

sigmoid(-10) = 0.0000  # Very negative -> close to 0
sigmoid( -2) = 0.1192  # Negative -> small
sigmoid(  0) = 0.5000  # Zero -> 0.5 (uncertain)
sigmoid(  2) = 0.8808  # Positive -> close to 1
sigmoid( 10) = 1.0000  # Very positive -> close to 1

Logistic Regression

Combine linear regression with sigmoid:

P(spam) = \sigma(w_0 + w_1 x_1 + w_2 x_2 + ... + w_n x_n)

Compute a weighted sum (like linear regression)
Pass through sigmoid to get a probability
If probability > 0.5, predict “spam”

def logistic_regression_predict_proba(X, w):
    """
    Predict probability of class 1.
    """
    z = X @ w  # Linear combination
    return sigmoid(z)  # Squash to probability

def logistic_regression_predict(X, w, threshold=0.5):
    """
    Predict class labels (0 or 1).
    """
    probabilities = logistic_regression_predict_proba(X, w)
    return (probabilities >= threshold).astype(int)

Training Logistic Regression

The Loss Function

For classification, we use Binary Cross-Entropy (log loss):

L = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)]

Intuition:

If actual is 1 and we predict 0.9 → small loss (good!)
If actual is 1 and we predict 0.1 → large loss (bad!)

def binary_cross_entropy(y_true, y_pred):
    """
    Compute binary cross-entropy loss.
    """
    # Clip predictions to avoid log(0)
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    
    loss = -np.mean(
        y_true * np.log(y_pred) + 
        (1 - y_true) * np.log(1 - y_pred)
    )
    return loss

Gradient Descent for Logistic Regression

def train_logistic_regression(X, y, learning_rate=0.1, num_epochs=1000):
    """
    Train logistic regression using gradient descent.
    """
    # Add bias column
    X_bias = np.column_stack([np.ones(len(X)), X])
    
    # Initialize weights
    w = np.zeros(X_bias.shape[1])
    
    for epoch in range(num_epochs):
        # Forward pass
        z = X_bias @ w
        predictions = sigmoid(z)
        
        # Compute loss
        loss = binary_cross_entropy(y, predictions)
        
        # Compute gradient
        errors = predictions - y
        gradient = X_bias.T @ errors / len(y)
        
        # Update weights
        w = w - learning_rate * gradient
        
        if epoch % 100 == 0:
            print(f"Epoch {epoch}: Loss = {loss:.4f}")
    
    return w

# Train on our email data
weights = train_logistic_regression(emails, labels)

# Make predictions
X_bias = np.column_stack([np.ones(len(emails)), emails])
probs = sigmoid(X_bias @ weights)
preds = (probs >= 0.5).astype(int)

print("\nPredictions vs Actual:")
for i in range(len(emails)):
    print(f"Email {i}: P(spam)={probs[i]:.2f}, Predicted={preds[i]}, Actual={labels[i]}")

Using scikit-learn

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Create and train model
model = LogisticRegression()
model.fit(emails, labels)

# Predict
predictions = model.predict(emails)
probabilities = model.predict_proba(emails)[:, 1]  # P(spam)

print("Predictions:", predictions)
print("Probabilities:", probabilities)
print(f"Accuracy: {accuracy_score(labels, predictions):.2%}")

Real Example: Breast Cancer Detection

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Load data
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
print("Features:", cancer.feature_names[:5], "...")
print("Classes:", cancer.target_names)  # ['malignant' 'benign']

# Split and scale
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train
model = LogisticRegression(max_iter=5000)
model.fit(X_train_scaled, y_train)

# Evaluate
y_pred = model.predict(X_test_scaled)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=cancer.target_names))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Understanding the Confusion Matrix

                  Predicted
                  Neg   Pos
Actual  Neg  [  TN    FP  ]
        Pos  [  FN    TP  ]

True Positive (TP): Predicted spam, was spam
True Negative (TN): Predicted not spam, was not spam
False Positive (FP): Predicted spam, was not spam (annoying!)
False Negative (FN): Predicted not spam, was spam (dangerous!)

Key Metrics

from sklearn.metrics import precision_score, recall_score, f1_score

# Precision: Of all spam predictions, how many were correct?
# "When we say spam, how often are we right?"
precision = precision_score(y_test, y_pred)

# Recall: Of all actual spam, how many did we catch?
# "What % of spam did we catch?"
recall = recall_score(y_test, y_pred)

# F1: Harmonic mean of precision and recall
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.2%}")
print(f"Recall:    {recall:.2%}")
print(f"F1 Score:  {f1:.2%}")

When to prioritize which metric?

High Precision needed: Spam filter (don’t want to miss important emails)
High Recall needed: Disease detection (don’t want to miss sick patients)
F1 Score: When you need balance between both

Multi-Class Classification

What if there are more than 2 classes?

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load iris data (3 classes of flowers)
iris = load_iris()
X, y = iris.data, iris.target
print("Classes:", iris.target_names)  # ['setosa' 'versicolor' 'virginica']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train (scikit-learn handles multi-class automatically!)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Get probabilities for each class
probs = model.predict_proba(X_test[:3])
print("\nProbabilities for first 3 samples:")
for i, p in enumerate(probs):
    print(f"Sample {i}: {dict(zip(iris.target_names, p.round(3)))}")

The Decision Boundary

Logistic regression creates a linear decision boundary:

import matplotlib.pyplot as plt

# Use just 2 features for visualization
X_2d = iris.data[:, :2]  # sepal length and width
y = iris.target

# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_2d, y)

# Create a mesh grid for decision boundary
x_min, x_max = X_2d[:, 0].min() - 0.5, X_2d[:, 0].max() + 0.5
y_min, y_max = X_2d[:, 1].min() - 0.5, X_2d[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap='viridis')
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap='viridis', edgecolors='black')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Logistic Regression Decision Boundary')
plt.show()

Key Takeaways

Classification = Categories

Predict discrete labels, not numbers

Sigmoid = Probability

Squash outputs to 0-1 range

Threshold = Decision

P > 0.5 means positive class

Metrics Matter

Accuracy isn’t always enough

🚀 Mini Projects

Project 1

Build a spam detector from scratch

Project 2

Medical diagnosis classifier with metrics analysis

Project 3

Customer churn prediction system

What’s Next?

Before moving to more complex algorithms, let’s learn K-Nearest Neighbors - an even more intuitive approach to classification!

Continue to Module 4a: K-Nearest Neighbors

Classify by finding similar examples - the simplest ML algorithm

Linear Regression K-Nearest Neighbors

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Classification

​A Different Kind of Prediction

​The Email Spam Problem

​The Data

​Why Not Just Use Linear Regression?

​The Sigmoid Function: Squashing to Probabilities

​Logistic Regression

​Training Logistic Regression

​The Loss Function

​Gradient Descent for Logistic Regression

​Using scikit-learn

​Real Example: Breast Cancer Detection

​Understanding the Confusion Matrix

​Key Metrics

​Multi-Class Classification

​The Decision Boundary

​Key Takeaways