Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Classification

Classification - Decision Boundary

A Different Kind of Prediction

In regression, we predict numbers: “This house costs $450,000” In classification, we predict categories: “This email is SPAM” Real-world classification problems:
  • Is this transaction fraudulent? (Yes/No)
  • What digit is in this image? (0-9)
  • Will this customer buy? (Yes/No)
  • What disease does this patient have? (A, B, C, D)
  • Is this review positive or negative? (Positive/Negative)
Medical Diagnosis Classification

The Email Spam Problem

Let’s build a spam detector from scratch.

The Data

Imagine each email is represented by features:
  • Number of exclamation marks
  • Contains word “FREE”
  • Contains word “WINNER”
  • Sender in contacts
  • Length of email
import numpy as np

# Email features: [exclamation_count, has_free, has_winner, in_contacts, length_bucket]
# Labels: 0 = not spam, 1 = spam

emails = np.array([
    [5, 1, 1, 0, 1],   # Short, has FREE and WINNER, lots of !!! -> likely spam
    [0, 0, 0, 1, 3],   # Long, from contact, no sketchy words -> not spam
    [3, 1, 0, 0, 1],   # Has FREE, some !!! -> maybe spam
    [0, 0, 0, 1, 2],   # From contact -> not spam
    [10, 1, 1, 0, 1],  # Very spammy
    [1, 0, 0, 1, 3],   # Normal email from contact
    [8, 1, 1, 0, 1],   # Spammy
    [0, 0, 0, 0, 2],   # Normal email
])

labels = np.array([1, 0, 1, 0, 1, 0, 1, 0])  # 1=spam, 0=not spam

Why Not Just Use Linear Regression?

Let’s try:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(emails, labels)

# Predict
predictions = model.predict(emails)
print("Predictions:", predictions)
# Output: [0.89, 0.12, 0.67, 0.15, 1.12, 0.18, 0.95, 0.22]
Problems:
  1. Predictions can be > 1 or < 0 (what does 1.12 “spam” mean?)
  2. We want probabilities (0 to 1), not arbitrary numbers
  3. We want a clear decision: spam or not spam

The Sigmoid Function: Squashing to Probabilities

We need a function that:
  • Takes any number (from -∞ to +∞)
  • Outputs a value between 0 and 1
  • Acts like a probability
Enter the sigmoid function — nature’s favorite dimmer switch: σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}} Think of it like a confidence meter. The linear model produces a raw score (could be -47 or +312), and sigmoid translates it into “how confident are we?” on a 0-to-1 scale. Very negative scores become “almost certainly not spam” (near 0), and very positive scores become “almost certainly spam” (near 1). Zero is the tipping point — 50/50.
def sigmoid(z):
    """Squash any number to range (0, 1)"""
    return 1 / (1 + np.exp(-z))

# Test it
for z in [-10, -2, 0, 2, 10]:
    print(f"sigmoid({z:3d}) = {sigmoid(z):.4f}")
Output:
sigmoid(-10) = 0.0000  # Very negative -> close to 0
sigmoid( -2) = 0.1192  # Negative -> small
sigmoid(  0) = 0.5000  # Zero -> 0.5 (uncertain)
sigmoid(  2) = 0.8808  # Positive -> close to 1
sigmoid( 10) = 1.0000  # Very positive -> close to 1

Logistic Regression

Combine linear regression with sigmoid: P(spam)=σ(w0+w1x1+w2x2+...+wnxn)P(spam) = \sigma(w_0 + w_1 x_1 + w_2 x_2 + ... + w_n x_n)
  1. Compute a weighted sum (like linear regression)
  2. Pass through sigmoid to get a probability
  3. If probability > 0.5, predict “spam”
def logistic_regression_predict_proba(X, w):
    """
    Predict probability of class 1.
    """
    z = X @ w  # Linear combination
    return sigmoid(z)  # Squash to probability

def logistic_regression_predict(X, w, threshold=0.5):
    """
    Predict class labels (0 or 1).
    """
    probabilities = logistic_regression_predict_proba(X, w)
    return (probabilities >= threshold).astype(int)

Training Logistic Regression

The Loss Function

For classification, we use Binary Cross-Entropy (log loss): L=1ni=1n[yilog(y^i)+(1yi)log(1y^i)]L = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)] Why not use MSE like in regression? Because MSE creates a loss surface with many flat plateaus for classification, making gradient descent painfully slow. Cross-entropy has steep slopes that push the model to fix its confident-but-wrong predictions aggressively. Intuition — think of it as a “surprise” score:
  • If actual is 1 and we predict 0.9 — small loss (not surprised, good prediction!)
  • If actual is 1 and we predict 0.1 — large loss (very surprised, terrible prediction!)
  • If actual is 1 and we predict 0.001 — enormous loss (the log function explodes as predictions approach 0, heavily penalizing confident wrong answers)
def binary_cross_entropy(y_true, y_pred):
    """
    Compute binary cross-entropy loss.
    """
    # Clip predictions to avoid log(0)
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    
    loss = -np.mean(
        y_true * np.log(y_pred) + 
        (1 - y_true) * np.log(1 - y_pred)
    )
    return loss

Gradient Descent for Logistic Regression

def train_logistic_regression(X, y, learning_rate=0.1, num_epochs=1000):
    """
    Train logistic regression using gradient descent.
    """
    # Add bias column
    X_bias = np.column_stack([np.ones(len(X)), X])
    
    # Initialize weights
    w = np.zeros(X_bias.shape[1])
    
    for epoch in range(num_epochs):
        # Forward pass
        z = X_bias @ w
        predictions = sigmoid(z)
        
        # Compute loss
        loss = binary_cross_entropy(y, predictions)
        
        # Compute gradient
        errors = predictions - y
        gradient = X_bias.T @ errors / len(y)
        
        # Update weights
        w = w - learning_rate * gradient
        
        if epoch % 100 == 0:
            print(f"Epoch {epoch}: Loss = {loss:.4f}")
    
    return w

# Train on our email data
weights = train_logistic_regression(emails, labels)

# Make predictions
X_bias = np.column_stack([np.ones(len(emails)), emails])
probs = sigmoid(X_bias @ weights)
preds = (probs >= 0.5).astype(int)

print("\nPredictions vs Actual:")
for i in range(len(emails)):
    print(f"Email {i}: P(spam)={probs[i]:.2f}, Predicted={preds[i]}, Actual={labels[i]}")

Using scikit-learn

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Create and train model.
# Despite its name, LogisticRegression is a CLASSIFIER, not a regressor.
# The "regression" in the name refers to the mathematical technique
# (fitting a logistic function), not the type of problem.
model = LogisticRegression()
model.fit(emails, labels)

# Predict hard labels (0 or 1)
predictions = model.predict(emails)

# Predict probabilities -- often more useful than hard labels.
# [:, 1] selects the probability of class 1 (spam).
# Use these for ranking, threshold tuning, or when downstream
# decisions depend on confidence level.
probabilities = model.predict_proba(emails)[:, 1]  # P(spam)

print("Predictions:", predictions)
print("Probabilities:", probabilities)
print(f"Accuracy: {accuracy_score(labels, predictions):.2%}")

Real Example: Breast Cancer Detection

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Load data
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
print("Features:", cancer.feature_names[:5], "...")
print("Classes:", cancer.target_names)  # ['malignant' 'benign']

# Split and scale
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train -- max_iter=5000 gives the optimizer enough iterations to converge.
# Logistic regression uses an iterative solver internally, and the default
# 100 iterations isn't always enough for high-dimensional data.
model = LogisticRegression(max_iter=5000)
model.fit(X_train_scaled, y_train)

# Evaluate on data the model has never seen
y_pred = model.predict(X_test_scaled)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=cancer.target_names))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
# In medical contexts, pay special attention to False Negatives (FN):
# a patient with cancer classified as benign. This is more dangerous
# than a False Positive (healthy person flagged for further testing).

Understanding the Confusion Matrix

                  Predicted
                  Neg   Pos
Actual  Neg  [  TN    FP  ]
        Pos  [  FN    TP  ]
  • True Positive (TP): Predicted spam, was spam
  • True Negative (TN): Predicted not spam, was not spam
  • False Positive (FP): Predicted spam, was not spam (annoying!)
  • False Negative (FN): Predicted not spam, was spam (dangerous!)

Key Metrics

from sklearn.metrics import precision_score, recall_score, f1_score

# Precision: Of all spam predictions, how many were correct?
# "When we say spam, how often are we right?"
precision = precision_score(y_test, y_pred)

# Recall: Of all actual spam, how many did we catch?
# "What % of spam did we catch?"
recall = recall_score(y_test, y_pred)

# F1: Harmonic mean of precision and recall
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.2%}")
print(f"Recall:    {recall:.2%}")
print(f"F1 Score:  {f1:.2%}")
When to prioritize which metric?Think of it as a cost-of-mistakes analysis:
  • High Precision needed: Spam filter — if you mark a real email as spam, your user misses an important message. The cost of a false positive is high.
  • High Recall needed: Disease detection — if you miss a sick patient and send them home, the consequences could be fatal. The cost of a false negative is high.
  • F1 Score: When you need balance between both, or when you’re not sure which type of mistake is worse. F1 is the harmonic mean, which means it punishes you if either precision or recall is low.
A senior engineer’s shortcut: Ask the business stakeholder “What’s worse — a false alarm or a missed catch?” Their answer tells you which metric to optimize.

Multi-Class Classification

What if there are more than 2 classes?
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load iris data (3 classes of flowers)
iris = load_iris()
X, y = iris.data, iris.target
print("Classes:", iris.target_names)  # ['setosa' 'versicolor' 'virginica']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train (scikit-learn handles multi-class automatically!)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Get probabilities for each class
probs = model.predict_proba(X_test[:3])
print("\nProbabilities for first 3 samples:")
for i, p in enumerate(probs):
    print(f"Sample {i}: {dict(zip(iris.target_names, p.round(3)))}")

The Decision Boundary

Logistic regression creates a linear decision boundary:
import matplotlib.pyplot as plt

# Use just 2 features for visualization
X_2d = iris.data[:, :2]  # sepal length and width
y = iris.target

# Train
model = LogisticRegression(max_iter=1000)
model.fit(X_2d, y)

# Create a mesh grid for decision boundary
x_min, x_max = X_2d[:, 0].min() - 0.5, X_2d[:, 0].max() + 0.5
y_min, y_max = X_2d[:, 1].min() - 0.5, X_2d[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap='viridis')
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap='viridis', edgecolors='black')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Logistic Regression Decision Boundary')
plt.show()

Key Takeaways

Classification = Categories

Predict discrete labels, not numbers

Sigmoid = Probability

Squash outputs to 0-1 range

Threshold = Decision

P > 0.5 means positive class

Metrics Matter

Accuracy isn’t always enough

🚀 Mini Projects

Project 1

Build a spam detector from scratch

Project 2

Medical diagnosis classifier with metrics analysis

Project 3

Customer churn prediction system

What’s Next?

Before moving to more complex algorithms, let’s learn K-Nearest Neighbors - an even more intuitive approach to classification!

Continue to Module 4a: K-Nearest Neighbors

Classify by finding similar examples - the simplest ML algorithm