Skip to main content

Classification

Classification - Decision Boundary

A Different Kind of Prediction

In regression, we predict numbers: “This house costs $450,000” In classification, we predict categories: “This email is SPAM” Real-world classification problems:
  • Is this transaction fraudulent? (Yes/No)
  • What digit is in this image? (0-9)
  • Will this customer buy? (Yes/No)
  • What disease does this patient have? (A, B, C, D)
  • Is this review positive or negative? (Positive/Negative)
Medical Diagnosis Classification

The Email Spam Problem

Let’s build a spam detector from scratch.

The Data

Imagine each email is represented by features:
  • Number of exclamation marks
  • Contains word “FREE”
  • Contains word “WINNER”
  • Sender in contacts
  • Length of email
import numpy as np

# Email features: [exclamation_count, has_free, has_winner, in_contacts, length_bucket]
# Labels: 0 = not spam, 1 = spam

emails = np.array([
    [5, 1, 1, 0, 1],   # Short, has FREE and WINNER, lots of !!! -> likely spam
    [0, 0, 0, 1, 3],   # Long, from contact, no sketchy words -> not spam
    [3, 1, 0, 0, 1],   # Has FREE, some !!! -> maybe spam
    [0, 0, 0, 1, 2],   # From contact -> not spam
    [10, 1, 1, 0, 1],  # Very spammy
    [1, 0, 0, 1, 3],   # Normal email from contact
    [8, 1, 1, 0, 1],   # Spammy
    [0, 0, 0, 0, 2],   # Normal email
])

labels = np.array([1, 0, 1, 0, 1, 0, 1, 0])  # 1=spam, 0=not spam

Why Not Just Use Linear Regression?

Let’s try:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(emails, labels)

# Predict
predictions = model.predict(emails)
print("Predictions:", predictions)
# Output: [0.89, 0.12, 0.67, 0.15, 1.12, 0.18, 0.95, 0.22]
Problems:
  1. Predictions can be > 1 or < 0 (what does 1.12 “spam” mean?)
  2. We want probabilities (0 to 1), not arbitrary numbers
  3. We want a clear decision: spam or not spam

The Sigmoid Function: Squashing to Probabilities

We need a function that:
  • Takes any number (from -∞ to +∞)
  • Outputs a value between 0 and 1
  • Acts like a probability
Enter the sigmoid function: σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}
def sigmoid(z):
    """Squash any number to range (0, 1)"""
    return 1 / (1 + np.exp(-z))

# Test it
for z in [-10, -2, 0, 2, 10]:
    print(f"sigmoid({z:3d}) = {sigmoid(z):.4f}")
Output:
sigmoid(-10) = 0.0000  # Very negative -> close to 0
sigmoid( -2) = 0.1192  # Negative -> small
sigmoid(  0) = 0.5000  # Zero -> 0.5 (uncertain)
sigmoid(  2) = 0.8808  # Positive -> close to 1
sigmoid( 10) = 1.0000  # Very positive -> close to 1

Logistic Regression

Combine linear regression with sigmoid: P(spam)=σ(w0+w1x1+w2x2+...+wnxn)P(spam) = \sigma(w_0 + w_1 x_1 + w_2 x_2 + ... + w_n x_n)
  1. Compute a weighted sum (like linear regression)
  2. Pass through sigmoid to get a probability
  3. If probability > 0.5, predict “spam”
def logistic_regression_predict_proba(X, w):
    """
    Predict probability of class 1.
    """
    z = X @ w  # Linear combination
    return sigmoid(z)  # Squash to probability

def logistic_regression_predict(X, w, threshold=0.5):
    """
    Predict class labels (0 or 1).
    """
    probabilities = logistic_regression_predict_proba(X, w)
    return (probabilities >= threshold).astype(int)

Training Logistic Regression

The Loss Function

For classification, we use Binary Cross-Entropy (log loss): L=1ni=1n[yilog(y^i)+(1yi)log(1y^i)]L = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)] Intuition:
  • If actual is 1 and we predict 0.9 → small loss (good!)
  • If actual is 1 and we predict 0.1 → large loss (bad!)
def binary_cross_entropy(y_true, y_pred):
    """
    Compute binary cross-entropy loss.
    """
    # Clip predictions to avoid log(0)
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    
    loss = -np.mean(
        y_true * np.log(y_pred) + 
        (1 - y_true) * np.log(1 - y_pred)
    )
    return loss

Gradient Descent for Logistic Regression

def train_logistic_regression(X, y, learning_rate=0.1, num_epochs=1000):
    """
    Train logistic regression using gradient descent.
    """
    # Add bias column
    X_bias = np.column_stack([np.ones(len(X)), X])
    
    # Initialize weights
    w = np.zeros(X_bias.shape[1])
    
    for epoch in range(num_epochs):
        # Forward pass
        z = X_bias @ w
        predictions = sigmoid(z)
        
        # Compute loss
        loss = binary_cross_entropy(y, predictions)
        
        # Compute gradient
        errors = predictions - y
        gradient = X_bias.T @ errors / len(y)
        
        # Update weights
        w = w - learning_rate * gradient
        
        if epoch % 100 == 0:
            print(f"Epoch {epoch}: Loss = {loss:.4f}")
    
    return w

# Train on our email data
weights = train_logistic_regression(emails, labels)

# Make predictions
X_bias = np.column_stack([np.ones(len(emails)), emails])
probs = sigmoid(X_bias @ weights)
preds = (probs >= 0.5).astype(int)

print("\nPredictions vs Actual:")
for i in range(len(emails)):
    print(f"Email {i}: P(spam)={probs[i]:.2f}, Predicted={preds[i]}, Actual={labels[i]}")

Using scikit-learn

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Create and train model
model = LogisticRegression()
model.fit(emails, labels)

# Predict
predictions = model.predict(emails)
probabilities = model.predict_proba(emails)[:, 1]  # P(spam)

print("Predictions:", predictions)
print("Probabilities:", probabilities)
print(f"Accuracy: {accuracy_score(labels, predictions):.2%}")

Real Example: Breast Cancer Detection

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Load data
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
print("Features:", cancer.feature_names[:5], "...")
print("Classes:", cancer.target_names)  # ['malignant' 'benign']

# Split and scale
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train
model = LogisticRegression(max_iter=5000)
model.fit(X_train_scaled, y_train)

# Evaluate
y_pred = model.predict(X_test_scaled)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=cancer.target_names))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Understanding the Confusion Matrix

                  Predicted
                  Neg   Pos
Actual  Neg  [  TN    FP  ]
        Pos  [  FN    TP  ]
  • True Positive (TP): Predicted spam, was spam
  • True Negative (TN): Predicted not spam, was not spam
  • False Positive (FP): Predicted spam, was not spam (annoying!)
  • False Negative (FN): Predicted not spam, was spam (dangerous!)

Key Metrics

from sklearn.metrics import precision_score, recall_score, f1_score

# Precision: Of all spam predictions, how many were correct?
# "When we say spam, how often are we right?"
precision = precision_score(y_test, y_pred)

# Recall: Of all actual spam, how many did we catch?
# "What % of spam did we catch?"
recall = recall_score(y_test, y_pred)

# F1: Harmonic mean of precision and recall
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.2%}")
print(f"Recall:    {recall:.2%}")
print(f"F1 Score:  {f1:.2%}")
When to prioritize which metric?
  • High Precision needed: Spam filter (don’t want to miss important emails)
  • High Recall needed: Disease detection (don’t want to miss sick patients)
  • F1 Score: When you need balance between both

Multi-Class Classification

What if there are more than 2 classes?
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load iris data (3 classes of flowers)
iris = load_iris()
X, y = iris.data, iris.target
print("Classes:", iris.target_names)  # ['setosa' 'versicolor' 'virginica']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train (scikit-learn handles multi-class automatically!)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Get probabilities for each class
probs = model.predict_proba(X_test[:3])
print("\nProbabilities for first 3 samples:")
for i, p in enumerate(probs):
    print(f"Sample {i}: {dict(zip(iris.target_names, p.round(3)))}")

The Decision Boundary

Logistic regression creates a linear decision boundary:
import matplotlib.pyplot as plt

# Use just 2 features for visualization
X_2d = iris.data[:, :2]  # sepal length and width
y = iris.target

# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_2d, y)

# Create a mesh grid for decision boundary
x_min, x_max = X_2d[:, 0].min() - 0.5, X_2d[:, 0].max() + 0.5
y_min, y_max = X_2d[:, 1].min() - 0.5, X_2d[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap='viridis')
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap='viridis', edgecolors='black')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Logistic Regression Decision Boundary')
plt.show()

Key Takeaways

Classification = Categories

Predict discrete labels, not numbers

Sigmoid = Probability

Squash outputs to 0-1 range

Threshold = Decision

P > 0.5 means positive class

Metrics Matter

Accuracy isn’t always enough

🚀 Mini Projects

Project 1

Build a spam detector from scratch

Project 2

Medical diagnosis classifier with metrics analysis

Project 3

Customer churn prediction system

What’s Next?

Before moving to more complex algorithms, let’s learn K-Nearest Neighbors - an even more intuitive approach to classification!

Continue to Module 4a: K-Nearest Neighbors

Classify by finding similar examples - the simplest ML algorithm