Hyperparameter Tuning
Parameters vs Hyperparameters
Parameters : Learned from data during training
Weights in linear regression
Split points in decision trees
Hyperparameters : Set before training
Learning rate
Number of trees in Random Forest
Maximum depth of trees
You choose hyperparameters. The model learns parameters.
The Tuning Problem
A Random Forest has many hyperparameters:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(
n_estimators = 100 , # How many trees?
max_depth = 10 , # How deep?
min_samples_split = 2 , # Min samples to split?
min_samples_leaf = 1 , # Min samples in leaf?
max_features = 'sqrt' , # Features per split?
bootstrap = True , # Sample with replacement?
random_state = 42
)
How do you find the best combination?
Grid Search: Try Everything
Define a grid of values and try every combination:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Load data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
cancer.data, cancer.target, test_size = 0.2 , random_state = 42
)
# Define parameter grid
param_grid = {
'n_estimators' : [ 50 , 100 , 200 ],
'max_depth' : [ 5 , 10 , 15 , None ],
'min_samples_split' : [ 2 , 5 , 10 ]
}
# Total combinations: 3 × 4 × 3 = 36
# Grid search
grid_search = GridSearchCV(
RandomForestClassifier( random_state = 42 ),
param_grid,
cv = 5 , # 5-fold cross-validation
scoring = 'accuracy' , # Metric to optimize
n_jobs =- 1 , # Use all CPU cores
verbose = 1 # Show progress
)
grid_search.fit(X_train, y_train)
# Results
print ( f "Best parameters: { grid_search.best_params_ } " )
print ( f "Best CV score: { grid_search.best_score_ :.4f} " )
print ( f "Test score: { grid_search.score(X_test, y_test) :.4f} " )
Visualizing Grid Search Results
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Get results as DataFrame
results = pd.DataFrame(grid_search.cv_results_)
results = results[[ 'param_n_estimators' , 'param_max_depth' , 'mean_test_score' , 'std_test_score' ]]
print (results.sort_values( 'mean_test_score' , ascending = False ).head( 10 ))
# Heatmap for 2 parameters
pivot = results.pivot_table(
values = 'mean_test_score' ,
index = 'param_max_depth' ,
columns = 'param_n_estimators'
)
plt.figure( figsize = ( 10 , 6 ))
plt.imshow(pivot, cmap = 'viridis' , aspect = 'auto' )
plt.colorbar( label = 'Mean CV Score' )
plt.xlabel( 'n_estimators' )
plt.ylabel( 'max_depth' )
plt.xticks( range ( len (pivot.columns)), pivot.columns)
plt.yticks( range ( len (pivot.index)), pivot.index)
plt.title( 'Grid Search Results' )
# Annotate
for i in range ( len (pivot.index)):
for j in range ( len (pivot.columns)):
plt.text(j, i, f ' { pivot.iloc[i, j] :.3f} ' , ha = 'center' , va = 'center' , color = 'white' )
plt.tight_layout()
plt.show()
Random Search: Smart Sampling
Grid search has a problem: exponential explosion .
5 hyperparameters
5 values each
5^5 = 3,125 combinations!
Random Search samples randomly from parameter distributions:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
# Define parameter distributions
param_distributions = {
'n_estimators' : randint( 50 , 300 ), # Integer between 50-300
'max_depth' : randint( 3 , 20 ), # Integer between 3-20
'min_samples_split' : randint( 2 , 15 ), # Integer between 2-15
'min_samples_leaf' : randint( 1 , 10 ), # Integer between 1-10
'max_features' : [ 'sqrt' , 'log2' , None ] # Categorical
}
# Random search
random_search = RandomizedSearchCV(
RandomForestClassifier( random_state = 42 ),
param_distributions,
n_iter = 50 , # Try 50 random combinations
cv = 5 ,
scoring = 'accuracy' ,
n_jobs =- 1 ,
random_state = 42 ,
verbose = 1
)
random_search.fit(X_train, y_train)
print ( f "Best parameters: { random_search.best_params_ } " )
print ( f "Best CV score: { random_search.best_score_ :.4f} " )
Research shows : Random search often finds good hyperparameters faster than grid search, especially when some hyperparameters matter more than others.
Bayesian Optimization: Learn from History
Instead of random sampling, use past results to guide the search:
# pip install scikit-optimize
from skopt import BayesSearchCV
from skopt.space import Integer, Real, Categorical
# Define search space
search_space = {
'n_estimators' : Integer( 50 , 300 ),
'max_depth' : Integer( 3 , 20 ),
'min_samples_split' : Integer( 2 , 15 ),
'min_samples_leaf' : Integer( 1 , 10 ),
'max_features' : Categorical([ 'sqrt' , 'log2' , None ])
}
# Bayesian search
bayes_search = BayesSearchCV(
RandomForestClassifier( random_state = 42 ),
search_space,
n_iter = 50 ,
cv = 5 ,
scoring = 'accuracy' ,
n_jobs =- 1 ,
random_state = 42 ,
verbose = 1
)
bayes_search.fit(X_train, y_train)
print ( f "Best parameters: { bayes_search.best_params_ } " )
print ( f "Best CV score: { bayes_search.best_score_ :.4f} " )
How Bayesian Optimization Works
Try some random points
Build a model of: parameter values → score
Use model to find promising regions
Evaluate and update model
Repeat
Balances exploration (try new areas) and exploitation (focus on promising areas).
Optuna: Modern Hyperparameter Tuning
# pip install optuna
import optuna
from sklearn.model_selection import cross_val_score
def objective ( trial ):
# Suggest hyperparameters
params = {
'n_estimators' : trial.suggest_int( 'n_estimators' , 50 , 300 ),
'max_depth' : trial.suggest_int( 'max_depth' , 3 , 20 ),
'min_samples_split' : trial.suggest_int( 'min_samples_split' , 2 , 15 ),
'min_samples_leaf' : trial.suggest_int( 'min_samples_leaf' , 1 , 10 ),
'max_features' : trial.suggest_categorical( 'max_features' , [ 'sqrt' , 'log2' , None ])
}
# Create and evaluate model
model = RandomForestClassifier( ** params, random_state = 42 )
scores = cross_val_score(model, X_train, y_train, cv = 5 , scoring = 'accuracy' )
return scores.mean()
# Run optimization
study = optuna.create_study( direction = 'maximize' )
study.optimize(objective, n_trials = 50 , show_progress_bar = True )
print ( f "Best trial: { study.best_trial.params } " )
print ( f "Best value: { study.best_value :.4f} " )
# Visualization
optuna.visualization.plot_optimization_history(study)
optuna.visualization.plot_param_importances(study)
Practical Tips
1. Start Coarse, Then Refine
# Step 1: Coarse search
param_grid_coarse = {
'n_estimators' : [ 50 , 100 , 200 ],
'max_depth' : [ 5 , 10 , 20 , None ]
}
grid_coarse = GridSearchCV(model, param_grid_coarse, cv = 3 )
grid_coarse.fit(X_train, y_train)
# Best: n_estimators=100, max_depth=10
# Step 2: Fine search around best
param_grid_fine = {
'n_estimators' : [ 80 , 100 , 120 ],
'max_depth' : [ 8 , 10 , 12 ]
}
grid_fine = GridSearchCV(model, param_grid_fine, cv = 5 )
grid_fine.fit(X_train, y_train)
2. Use Early Stopping for Speed
from sklearn.ensemble import GradientBoostingClassifier
# Only tune what matters most
param_grid = {
'n_estimators' : [ 100 , 200 , 500 ],
'learning_rate' : [ 0.01 , 0.1 , 0.3 ],
'max_depth' : [ 3 , 5 , 7 ]
}
3. Different Metrics for Different Problems
from sklearn.model_selection import GridSearchCV
# Classification
scoring_classification = [ 'accuracy' , 'f1' , 'roc_auc' , 'precision' , 'recall' ]
# Use refit to choose final model
grid = GridSearchCV(
model,
param_grid,
cv = 5 ,
scoring = scoring_classification,
refit = 'f1' # Final model optimizes for F1
)
4. Nested Cross-Validation
For unbiased evaluation of the tuning process:
from sklearn.model_selection import cross_val_score, GridSearchCV
# Inner loop: tune hyperparameters
inner_cv = GridSearchCV(
RandomForestClassifier( random_state = 42 ),
param_grid,
cv = 5
)
# Outer loop: evaluate the whole tuning process
outer_scores = cross_val_score(inner_cv, X, y, cv = 5 )
print ( f "Nested CV Score: { outer_scores.mean() :.4f} (+/- { outer_scores.std() :.4f} )" )
Common Hyperparameters by Model
Random Forest
{
'n_estimators' : [ 100 , 200 , 500 ],
'max_depth' : [ 5 , 10 , 15 , None ],
'min_samples_split' : [ 2 , 5 , 10 ],
'max_features' : [ 'sqrt' , 'log2' ]
}
Gradient Boosting / XGBoost
{
'n_estimators' : [ 100 , 200 , 500 ],
'learning_rate' : [ 0.01 , 0.1 , 0.3 ],
'max_depth' : [ 3 , 5 , 7 ],
'subsample' : [ 0.8 , 1.0 ],
'colsample_bytree' : [ 0.8 , 1.0 ] # XGBoost
}
SVM
{
'C' : [ 0.1 , 1 , 10 , 100 ],
'gamma' : [ 'scale' , 'auto' , 0.1 , 1 ],
'kernel' : [ 'rbf' , 'poly' ]
}
Neural Networks
{
'hidden_layer_sizes' : [( 50 ,), ( 100 ,), ( 50 , 50 ), ( 100 , 50 )],
'alpha' : [ 0.0001 , 0.001 , 0.01 ],
'learning_rate_init' : [ 0.001 , 0.01 ]
}
🚀 Mini Projects
Project 1: Search Strategy Comparison Compare Grid, Random, and Bayesian search
Project 2: Learning Curve Analyzer Diagnose underfitting vs overfitting with tuning
Project 3: Custom Hyperparameter Optimizer Build your own optimization algorithm
Project 4: Auto-ML Mini Framework Create an automated model tuning system
Project 1: Search Strategy Comparison
Compare different hyperparameter search strategies on the same problem.
Project 2: Learning Curve Analyzer
Use learning curves to determine if more data or different hyperparameters would help.
Project 3: Custom Hyperparameter Optimizer
Build a simple Bayesian-style optimizer from scratch.
Project 4: Auto-ML Mini Framework
Create an automated model tuning system that handles multiple models.
Key Takeaways
Grid Search Exhaustive but slow. Good for small spaces.
Random Search Often better than grid. Use for larger spaces.
Bayesian Optimization Smart search. Best for expensive evaluations.
Nested CV Unbiased estimate of tuning performance.
What’s Next?
You’ve learned individual algorithms. Now let’s see how to tackle real-world ML projects end-to-end!
Continue to Module 10: End-to-End ML Project Apply everything in a complete machine learning project