Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Hyperparameter Tuning

Grid Search vs Random Search

Parameters vs Hyperparameters

Parameters: Learned from data during training
  • Weights in linear regression
  • Split points in decision trees
Hyperparameters: Set before training
  • Learning rate
  • Number of trees in Random Forest
  • Maximum depth of trees
You choose hyperparameters. The model learns parameters. Think of it like baking a cake. The parameters are the exact mixing ratios the recipe produces (flour, sugar, eggs). The hyperparameters are the oven temperature and baking time — you set them before baking starts, and they control how the recipe turns out. You can’t learn the oven temperature from the batter; you have to experiment.
YouTube Recommendation Tuning

The Tuning Problem

A Random Forest has many hyperparameters:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=100,       # How many trees?
    max_depth=10,           # How deep?
    min_samples_split=2,    # Min samples to split?
    min_samples_leaf=1,     # Min samples in leaf?
    max_features='sqrt',    # Features per split?
    bootstrap=True,         # Sample with replacement?
    random_state=42
)
How do you find the best combination?

Grid Search: Try Everything

Grid search is the brute-force approach: define a grid of values and try every single combination. It’s like testing every possible combination of oven temperature and baking time — thorough but expensive. For 3 hyperparameters with 4 values each, that’s 4 x 4 x 4 = 64 combinations, each requiring a full cross-validation run.
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, test_size=0.2, random_state=42
)

# Define parameter grid -- start coarse, refine later.
# Common mistake: putting too many values in the grid.
# 3 values per parameter is usually enough for a first pass.
param_grid = {
    'n_estimators': [50, 100, 200],       # More trees = better but slower
    'max_depth': [5, 10, 15, None],       # None = unlimited (risk of overfitting)
    'min_samples_split': [2, 5, 10]       # Higher = more regularization
}

# Total combinations: 3 x 4 x 3 = 36, each evaluated with 5-fold CV
# That's 36 x 5 = 180 model fits!

# Grid search
grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,               # 5-fold cross-validation for each combination
    scoring='accuracy', # Metric to optimize -- use 'f1' for imbalanced data
    n_jobs=-1,          # Use all CPU cores -- essential for grid search speed
    verbose=1           # Show progress (set to 2 for more detail)
)

grid_search.fit(X_train, y_train)

# Results
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.4f}")
print(f"Test score: {grid_search.score(X_test, y_test):.4f}")

Visualizing Grid Search Results

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Get results as DataFrame
results = pd.DataFrame(grid_search.cv_results_)
results = results[['param_n_estimators', 'param_max_depth', 'mean_test_score', 'std_test_score']]
print(results.sort_values('mean_test_score', ascending=False).head(10))

# Heatmap for 2 parameters
pivot = results.pivot_table(
    values='mean_test_score',
    index='param_max_depth',
    columns='param_n_estimators'
)

plt.figure(figsize=(10, 6))
plt.imshow(pivot, cmap='viridis', aspect='auto')
plt.colorbar(label='Mean CV Score')
plt.xlabel('n_estimators')
plt.ylabel('max_depth')
plt.xticks(range(len(pivot.columns)), pivot.columns)
plt.yticks(range(len(pivot.index)), pivot.index)
plt.title('Grid Search Results')

# Annotate
for i in range(len(pivot.index)):
    for j in range(len(pivot.columns)):
        plt.text(j, i, f'{pivot.iloc[i, j]:.3f}', ha='center', va='center', color='white')

plt.tight_layout()
plt.show()

Random Search: Smart Sampling

Grid search has a problem: exponential explosion.
  • 5 hyperparameters
  • 5 values each
  • 5^5 = 3,125 combinations!
Here’s the key insight from Bergstra and Bengio’s 2012 paper: in most ML problems, only 1-2 hyperparameters actually matter. Grid search wastes most of its budget exhaustively varying the ones that don’t matter. Random search, by contrast, samples each important dimension more thoroughly. Think of it like searching for a lost key in a field: grid search mows the lawn in neat rows, while random search drops random probes — if the key is somewhere along a specific line, random search is more likely to hit that line. Random Search samples randomly from parameter distributions:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

# Define parameter distributions
param_distributions = {
    'n_estimators': randint(50, 300),           # Integer between 50-300
    'max_depth': randint(3, 20),                # Integer between 3-20
    'min_samples_split': randint(2, 15),        # Integer between 2-15
    'min_samples_leaf': randint(1, 10),         # Integer between 1-10
    'max_features': ['sqrt', 'log2', None]      # Categorical
}

# Random search
random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_distributions,
    n_iter=50,          # Try 50 random combinations
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42,
    verbose=1
)

random_search.fit(X_train, y_train)

print(f"Best parameters: {random_search.best_params_}")
print(f"Best CV score: {random_search.best_score_:.4f}")
Research shows (Bergstra & Bengio, 2012): Random search often finds good hyperparameters faster than grid search. The intuition is simple — if only 2 of your 5 hyperparameters actually matter (which is common), grid search wastes most of its budget exploring irrelevant dimensions. Random search spreads trials across all dimensions, so you explore more unique values of the important parameters with the same compute budget.Practical rule: Use grid search when you have 2-3 hyperparameters with known good ranges. Use random search when you have 4+ hyperparameters or wide, uncertain ranges.

Bayesian Optimization: Learn from History

Grid search ignores past results. Random search ignores past results. Bayesian optimization is smarter — it builds a model of “which hyperparameters lead to good scores” and uses that model to decide where to look next. Think of it like a gold prospector who, after finding gold in one spot, digs nearby rather than randomly across the entire mountain.
# pip install scikit-optimize
from skopt import BayesSearchCV
from skopt.space import Integer, Real, Categorical

# Define search space
search_space = {
    'n_estimators': Integer(50, 300),
    'max_depth': Integer(3, 20),
    'min_samples_split': Integer(2, 15),
    'min_samples_leaf': Integer(1, 10),
    'max_features': Categorical(['sqrt', 'log2', None])
}

# Bayesian search
bayes_search = BayesSearchCV(
    RandomForestClassifier(random_state=42),
    search_space,
    n_iter=50,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42,
    verbose=1
)

bayes_search.fit(X_train, y_train)

print(f"Best parameters: {bayes_search.best_params_}")
print(f"Best CV score: {bayes_search.best_score_:.4f}")

How Bayesian Optimization Works

  1. Try some random points
  2. Build a model of: parameter values → score
  3. Use model to find promising regions
  4. Evaluate and update model
  5. Repeat
Balances exploration (try new areas) and exploitation (focus on promising areas).

Optuna: Modern Hyperparameter Tuning

# pip install optuna
import optuna
from sklearn.model_selection import cross_val_score

def objective(trial):
    # Suggest hyperparameters
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 20),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 15),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
        'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2', None])
    }
    
    # Create and evaluate model
    model = RandomForestClassifier(**params, random_state=42)
    scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
    
    return scores.mean()

# Run optimization
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50, show_progress_bar=True)

print(f"Best trial: {study.best_trial.params}")
print(f"Best value: {study.best_value:.4f}")

# Visualization
optuna.visualization.plot_optimization_history(study)
optuna.visualization.plot_param_importances(study)

Practical Tips

1. Start Coarse, Then Refine

# Step 1: Coarse search
param_grid_coarse = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 20, None]
}

grid_coarse = GridSearchCV(model, param_grid_coarse, cv=3)
grid_coarse.fit(X_train, y_train)
# Best: n_estimators=100, max_depth=10

# Step 2: Fine search around best
param_grid_fine = {
    'n_estimators': [80, 100, 120],
    'max_depth': [8, 10, 12]
}

grid_fine = GridSearchCV(model, param_grid_fine, cv=5)
grid_fine.fit(X_train, y_train)

2. Prioritize the Most Impactful Hyperparameters

Not all hyperparameters are created equal. Tune the ones that move the needle most, and leave the rest at sensible defaults.
from sklearn.ensemble import GradientBoostingClassifier

# For Gradient Boosting, these three interact heavily and matter most:
# - learning_rate: controls step size (low = more trees needed but better generalization)
# - n_estimators: number of boosting rounds (tied to learning_rate)
# - max_depth: complexity of each tree (usually 3-7 for boosting)
# Rule of thumb: lower learning_rate + more n_estimators = better results, more compute.
param_grid = {
    'n_estimators': [100, 200, 500],
    'learning_rate': [0.01, 0.1, 0.3],
    'max_depth': [3, 5, 7]
}

3. Different Metrics for Different Problems

from sklearn.model_selection import GridSearchCV

# Classification
scoring_classification = ['accuracy', 'f1', 'roc_auc', 'precision', 'recall']

# Use refit to choose final model
grid = GridSearchCV(
    model, 
    param_grid,
    cv=5,
    scoring=scoring_classification,
    refit='f1'  # Final model optimizes for F1
)

4. Nested Cross-Validation

For unbiased evaluation of the tuning process. This is subtle but important: regular cross-validation with hyperparameter tuning gives you an optimistically biased estimate of performance. You picked the best hyperparameters on the same folds you’re reporting results for. Nested CV fixes this by using separate inner folds for tuning and outer folds for evaluation.
from sklearn.model_selection import cross_val_score, GridSearchCV

# Inner loop: tune hyperparameters (picks the best settings)
inner_cv = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5
)

# Outer loop: evaluate the ENTIRE tuning process on held-out data
# This gives you an honest estimate of "if I gave this tuning pipeline
# new data, how well would it perform?"
outer_scores = cross_val_score(inner_cv, X, y, cv=5)
print(f"Nested CV Score: {outer_scores.mean():.4f} (+/- {outer_scores.std():.4f})")
# This score will typically be 1-3% lower than non-nested CV --
# the difference is the "optimism" that regular CV introduces.

Common Hyperparameters by Model

Common ML mistake — tuning before feature engineering: Hyperparameter tuning typically yields 1-3% improvement. Good feature engineering yields 5-20%. Always get your features right first, then tune. A perfectly tuned model on bad features will lose to a default model on great features every time.

Random Forest

{
    'n_estimators': [100, 200, 500],     # Most important: diminishing returns past ~200
    'max_depth': [5, 10, 15, None],      # Main overfitting control
    'min_samples_split': [2, 5, 10],     # Secondary regularization
    'max_features': ['sqrt', 'log2']     # Controls tree diversity
}
# Tuning priority: max_depth > n_estimators > min_samples_split > max_features

Gradient Boosting / XGBoost

{
    'n_estimators': [100, 200, 500],     # More trees + lower LR = better (but slower)
    'learning_rate': [0.01, 0.1, 0.3],  # MOST IMPORTANT -- interacts with n_estimators
    'max_depth': [3, 5, 7],             # Keep shallow (3-7) -- boosting adds depth via iterations
    'subsample': [0.8, 1.0],            # Row sampling -- adds regularization
    'colsample_bytree': [0.8, 1.0]     # XGBoost column sampling -- like max_features for RF
}
# Key insight: learning_rate and n_estimators are coupled.
# Lower LR needs more estimators. Start with LR=0.1, n=100.

SVM

{
    'C': [0.1, 1, 10, 100],             # Regularization strength (inverse)
    'gamma': ['scale', 'auto', 0.1, 1], # RBF kernel width -- most common overfitting cause
    'kernel': ['rbf', 'poly']            # RBF is the default starting point
}
# WARNING: SVM tuning is O(n^2) in dataset size. For >50K samples,
# consider LinearSVC with just C, or switch to tree-based models.

Neural Networks

{
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)],
    'alpha': [0.0001, 0.001, 0.01],
    'learning_rate_init': [0.001, 0.01]
}

🚀 Mini Projects

Project 1: Search Strategy Comparison

Compare Grid, Random, and Bayesian search

Project 2: Learning Curve Analyzer

Diagnose underfitting vs overfitting with tuning

Project 3: Custom Hyperparameter Optimizer

Build your own optimization algorithm

Project 4: Auto-ML Mini Framework

Create an automated model tuning system

Project 1: Search Strategy Comparison

Compare different hyperparameter search strategies on the same problem.

Project 2: Learning Curve Analyzer

Use learning curves to determine if more data or different hyperparameters would help.

Project 3: Custom Hyperparameter Optimizer

Build a simple Bayesian-style optimizer from scratch.

Project 4: Auto-ML Mini Framework

Create an automated model tuning system that handles multiple models.

Key Takeaways

Grid Search

Exhaustive but slow. Good for small spaces.

Random Search

Often better than grid. Use for larger spaces.

Bayesian Optimization

Smart search. Best for expensive evaluations.

Nested CV

Unbiased estimate of tuning performance.

What’s Next?

You’ve learned individual algorithms. Now let’s see how to tackle real-world ML projects end-to-end!

Continue to Module 10: End-to-End ML Project

Apply everything in a complete machine learning project