Skip to main content

Hyperparameter Tuning

Grid Search vs Random Search

Parameters vs Hyperparameters

Parameters: Learned from data during training
  • Weights in linear regression
  • Split points in decision trees
Hyperparameters: Set before training
  • Learning rate
  • Number of trees in Random Forest
  • Maximum depth of trees
You choose hyperparameters. The model learns parameters.
YouTube Recommendation Tuning

The Tuning Problem

A Random Forest has many hyperparameters:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=100,       # How many trees?
    max_depth=10,           # How deep?
    min_samples_split=2,    # Min samples to split?
    min_samples_leaf=1,     # Min samples in leaf?
    max_features='sqrt',    # Features per split?
    bootstrap=True,         # Sample with replacement?
    random_state=42
)
How do you find the best combination?

Grid Search: Try Everything

Define a grid of values and try every combination:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, test_size=0.2, random_state=42
)

# Define parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15, None],
    'min_samples_split': [2, 5, 10]
}

# Total combinations: 3 × 4 × 3 = 36

# Grid search
grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,               # 5-fold cross-validation
    scoring='accuracy', # Metric to optimize
    n_jobs=-1,          # Use all CPU cores
    verbose=1           # Show progress
)

grid_search.fit(X_train, y_train)

# Results
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.4f}")
print(f"Test score: {grid_search.score(X_test, y_test):.4f}")

Visualizing Grid Search Results

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Get results as DataFrame
results = pd.DataFrame(grid_search.cv_results_)
results = results[['param_n_estimators', 'param_max_depth', 'mean_test_score', 'std_test_score']]
print(results.sort_values('mean_test_score', ascending=False).head(10))

# Heatmap for 2 parameters
pivot = results.pivot_table(
    values='mean_test_score',
    index='param_max_depth',
    columns='param_n_estimators'
)

plt.figure(figsize=(10, 6))
plt.imshow(pivot, cmap='viridis', aspect='auto')
plt.colorbar(label='Mean CV Score')
plt.xlabel('n_estimators')
plt.ylabel('max_depth')
plt.xticks(range(len(pivot.columns)), pivot.columns)
plt.yticks(range(len(pivot.index)), pivot.index)
plt.title('Grid Search Results')

# Annotate
for i in range(len(pivot.index)):
    for j in range(len(pivot.columns)):
        plt.text(j, i, f'{pivot.iloc[i, j]:.3f}', ha='center', va='center', color='white')

plt.tight_layout()
plt.show()

Random Search: Smart Sampling

Grid search has a problem: exponential explosion.
  • 5 hyperparameters
  • 5 values each
  • 5^5 = 3,125 combinations!
Random Search samples randomly from parameter distributions:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

# Define parameter distributions
param_distributions = {
    'n_estimators': randint(50, 300),           # Integer between 50-300
    'max_depth': randint(3, 20),                # Integer between 3-20
    'min_samples_split': randint(2, 15),        # Integer between 2-15
    'min_samples_leaf': randint(1, 10),         # Integer between 1-10
    'max_features': ['sqrt', 'log2', None]      # Categorical
}

# Random search
random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_distributions,
    n_iter=50,          # Try 50 random combinations
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42,
    verbose=1
)

random_search.fit(X_train, y_train)

print(f"Best parameters: {random_search.best_params_}")
print(f"Best CV score: {random_search.best_score_:.4f}")
Research shows: Random search often finds good hyperparameters faster than grid search, especially when some hyperparameters matter more than others.

Bayesian Optimization: Learn from History

Instead of random sampling, use past results to guide the search:
# pip install scikit-optimize
from skopt import BayesSearchCV
from skopt.space import Integer, Real, Categorical

# Define search space
search_space = {
    'n_estimators': Integer(50, 300),
    'max_depth': Integer(3, 20),
    'min_samples_split': Integer(2, 15),
    'min_samples_leaf': Integer(1, 10),
    'max_features': Categorical(['sqrt', 'log2', None])
}

# Bayesian search
bayes_search = BayesSearchCV(
    RandomForestClassifier(random_state=42),
    search_space,
    n_iter=50,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42,
    verbose=1
)

bayes_search.fit(X_train, y_train)

print(f"Best parameters: {bayes_search.best_params_}")
print(f"Best CV score: {bayes_search.best_score_:.4f}")

How Bayesian Optimization Works

  1. Try some random points
  2. Build a model of: parameter values → score
  3. Use model to find promising regions
  4. Evaluate and update model
  5. Repeat
Balances exploration (try new areas) and exploitation (focus on promising areas).

Optuna: Modern Hyperparameter Tuning

# pip install optuna
import optuna
from sklearn.model_selection import cross_val_score

def objective(trial):
    # Suggest hyperparameters
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 20),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 15),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
        'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2', None])
    }
    
    # Create and evaluate model
    model = RandomForestClassifier(**params, random_state=42)
    scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
    
    return scores.mean()

# Run optimization
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50, show_progress_bar=True)

print(f"Best trial: {study.best_trial.params}")
print(f"Best value: {study.best_value:.4f}")

# Visualization
optuna.visualization.plot_optimization_history(study)
optuna.visualization.plot_param_importances(study)

Practical Tips

1. Start Coarse, Then Refine

# Step 1: Coarse search
param_grid_coarse = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 20, None]
}

grid_coarse = GridSearchCV(model, param_grid_coarse, cv=3)
grid_coarse.fit(X_train, y_train)
# Best: n_estimators=100, max_depth=10

# Step 2: Fine search around best
param_grid_fine = {
    'n_estimators': [80, 100, 120],
    'max_depth': [8, 10, 12]
}

grid_fine = GridSearchCV(model, param_grid_fine, cv=5)
grid_fine.fit(X_train, y_train)

2. Use Early Stopping for Speed

from sklearn.ensemble import GradientBoostingClassifier

# Only tune what matters most
param_grid = {
    'n_estimators': [100, 200, 500],
    'learning_rate': [0.01, 0.1, 0.3],
    'max_depth': [3, 5, 7]
}

3. Different Metrics for Different Problems

from sklearn.model_selection import GridSearchCV

# Classification
scoring_classification = ['accuracy', 'f1', 'roc_auc', 'precision', 'recall']

# Use refit to choose final model
grid = GridSearchCV(
    model, 
    param_grid,
    cv=5,
    scoring=scoring_classification,
    refit='f1'  # Final model optimizes for F1
)

4. Nested Cross-Validation

For unbiased evaluation of the tuning process:
from sklearn.model_selection import cross_val_score, GridSearchCV

# Inner loop: tune hyperparameters
inner_cv = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5
)

# Outer loop: evaluate the whole tuning process
outer_scores = cross_val_score(inner_cv, X, y, cv=5)
print(f"Nested CV Score: {outer_scores.mean():.4f} (+/- {outer_scores.std():.4f})")

Common Hyperparameters by Model

Random Forest

{
    'n_estimators': [100, 200, 500],
    'max_depth': [5, 10, 15, None],
    'min_samples_split': [2, 5, 10],
    'max_features': ['sqrt', 'log2']
}

Gradient Boosting / XGBoost

{
    'n_estimators': [100, 200, 500],
    'learning_rate': [0.01, 0.1, 0.3],
    'max_depth': [3, 5, 7],
    'subsample': [0.8, 1.0],
    'colsample_bytree': [0.8, 1.0]  # XGBoost
}

SVM

{
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.1, 1],
    'kernel': ['rbf', 'poly']
}

Neural Networks

{
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)],
    'alpha': [0.0001, 0.001, 0.01],
    'learning_rate_init': [0.001, 0.01]
}

🚀 Mini Projects

Project 1: Search Strategy Comparison

Compare Grid, Random, and Bayesian search

Project 2: Learning Curve Analyzer

Diagnose underfitting vs overfitting with tuning

Project 3: Custom Hyperparameter Optimizer

Build your own optimization algorithm

Project 4: Auto-ML Mini Framework

Create an automated model tuning system

Project 1: Search Strategy Comparison

Compare different hyperparameter search strategies on the same problem.

Project 2: Learning Curve Analyzer

Use learning curves to determine if more data or different hyperparameters would help.

Project 3: Custom Hyperparameter Optimizer

Build a simple Bayesian-style optimizer from scratch.

Project 4: Auto-ML Mini Framework

Create an automated model tuning system that handles multiple models.

Key Takeaways

Grid Search

Exhaustive but slow. Good for small spaces.

Random Search

Often better than grid. Use for larger spaces.

Bayesian Optimization

Smart search. Best for expensive evaluations.

Nested CV

Unbiased estimate of tuning performance.

What’s Next?

You’ve learned individual algorithms. Now let’s see how to tackle real-world ML projects end-to-end!

Continue to Module 10: End-to-End ML Project

Apply everything in a complete machine learning project