> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Hyperparameter Tuning

> Find the best settings for your model automatically

# Hyperparameter Tuning

<Frame>
  <img src="https://mintcdn.com/devweeekends/1cs3K7TO-w20cKuc/images/courses/ml-mastery/hyperparameter-tuning-concept.svg?fit=max&auto=format&n=1cs3K7TO-w20cKuc&q=85&s=b33fce0960b1efcc1b41dfe84429693e" alt="Grid Search vs Random Search" width="1080" height="1080" data-path="images/courses/ml-mastery/hyperparameter-tuning-concept.svg" />
</Frame>

## Parameters vs Hyperparameters

**Parameters**: Learned from data during training

* Weights in linear regression
* Split points in decision trees

**Hyperparameters**: Set before training

* Learning rate
* Number of trees in Random Forest
* Maximum depth of trees

**You choose hyperparameters. The model learns parameters.**

Think of it like baking a cake. The *parameters* are the exact mixing ratios the recipe produces (flour, sugar, eggs). The *hyperparameters* are the oven temperature and baking time -- you set them before baking starts, and they control how the recipe turns out. You can't learn the oven temperature from the batter; you have to experiment.

<Frame>
  <img src="https://mintcdn.com/devweeekends/1cs3K7TO-w20cKuc/images/courses/ml-mastery/hyperparameter-tuning-real-world.svg?fit=max&auto=format&n=1cs3K7TO-w20cKuc&q=85&s=5ebd71da5b3966b03918b922dce459f7" alt="YouTube Recommendation Tuning" width="1080" height="1080" data-path="images/courses/ml-mastery/hyperparameter-tuning-real-world.svg" />
</Frame>

***

## The Tuning Problem

A Random Forest has many hyperparameters:

```python theme={null}
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=100,       # How many trees?
    max_depth=10,           # How deep?
    min_samples_split=2,    # Min samples to split?
    min_samples_leaf=1,     # Min samples in leaf?
    max_features='sqrt',    # Features per split?
    bootstrap=True,         # Sample with replacement?
    random_state=42
)
```

How do you find the best combination?

***

## Grid Search: Try Everything

Grid search is the brute-force approach: define a grid of values and try every single combination. It's like testing every possible combination of oven temperature and baking time -- thorough but expensive. For 3 hyperparameters with 4 values each, that's 4 x 4 x 4 = 64 combinations, each requiring a full cross-validation run.

```python theme={null}
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, test_size=0.2, random_state=42
)

# Define parameter grid -- start coarse, refine later.
# Common mistake: putting too many values in the grid.
# 3 values per parameter is usually enough for a first pass.
param_grid = {
    'n_estimators': [50, 100, 200],       # More trees = better but slower
    'max_depth': [5, 10, 15, None],       # None = unlimited (risk of overfitting)
    'min_samples_split': [2, 5, 10]       # Higher = more regularization
}

# Total combinations: 3 x 4 x 3 = 36, each evaluated with 5-fold CV
# That's 36 x 5 = 180 model fits!

# Grid search
grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,               # 5-fold cross-validation for each combination
    scoring='accuracy', # Metric to optimize -- use 'f1' for imbalanced data
    n_jobs=-1,          # Use all CPU cores -- essential for grid search speed
    verbose=1           # Show progress (set to 2 for more detail)
)

grid_search.fit(X_train, y_train)

# Results
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.4f}")
print(f"Test score: {grid_search.score(X_test, y_test):.4f}")
```

### Visualizing Grid Search Results

```python theme={null}
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Get results as DataFrame
results = pd.DataFrame(grid_search.cv_results_)
results = results[['param_n_estimators', 'param_max_depth', 'mean_test_score', 'std_test_score']]
print(results.sort_values('mean_test_score', ascending=False).head(10))

# Heatmap for 2 parameters
pivot = results.pivot_table(
    values='mean_test_score',
    index='param_max_depth',
    columns='param_n_estimators'
)

plt.figure(figsize=(10, 6))
plt.imshow(pivot, cmap='viridis', aspect='auto')
plt.colorbar(label='Mean CV Score')
plt.xlabel('n_estimators')
plt.ylabel('max_depth')
plt.xticks(range(len(pivot.columns)), pivot.columns)
plt.yticks(range(len(pivot.index)), pivot.index)
plt.title('Grid Search Results')

# Annotate
for i in range(len(pivot.index)):
    for j in range(len(pivot.columns)):
        plt.text(j, i, f'{pivot.iloc[i, j]:.3f}', ha='center', va='center', color='white')

plt.tight_layout()
plt.show()
```

***

## Random Search: Smart Sampling

Grid search has a problem: **exponential explosion**.

* 5 hyperparameters
* 5 values each
* 5^5 = 3,125 combinations!

Here's the key insight from Bergstra and Bengio's 2012 paper: in most ML problems, only 1-2 hyperparameters actually matter. Grid search wastes most of its budget exhaustively varying the ones that don't matter. Random search, by contrast, samples each important dimension more thoroughly. Think of it like searching for a lost key in a field: grid search mows the lawn in neat rows, while random search drops random probes -- if the key is somewhere along a specific line, random search is more likely to hit that line.

**Random Search** samples randomly from parameter distributions:

```python theme={null}
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

# Define parameter distributions
param_distributions = {
    'n_estimators': randint(50, 300),           # Integer between 50-300
    'max_depth': randint(3, 20),                # Integer between 3-20
    'min_samples_split': randint(2, 15),        # Integer between 2-15
    'min_samples_leaf': randint(1, 10),         # Integer between 1-10
    'max_features': ['sqrt', 'log2', None]      # Categorical
}

# Random search
random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_distributions,
    n_iter=50,          # Try 50 random combinations
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42,
    verbose=1
)

random_search.fit(X_train, y_train)

print(f"Best parameters: {random_search.best_params_}")
print(f"Best CV score: {random_search.best_score_:.4f}")
```

<Note>
  **Research shows** (Bergstra & Bengio, 2012): Random search often finds good hyperparameters faster than grid search. The intuition is simple -- if only 2 of your 5 hyperparameters actually matter (which is common), grid search wastes most of its budget exploring irrelevant dimensions. Random search spreads trials across all dimensions, so you explore more unique values of the important parameters with the same compute budget.

  **Practical rule**: Use grid search when you have 2-3 hyperparameters with known good ranges. Use random search when you have 4+ hyperparameters or wide, uncertain ranges.
</Note>

***

## Bayesian Optimization: Learn from History

Grid search ignores past results. Random search ignores past results. Bayesian optimization is smarter -- it builds a model of "which hyperparameters lead to good scores" and uses that model to decide where to look next. Think of it like a gold prospector who, after finding gold in one spot, digs nearby rather than randomly across the entire mountain.

```python theme={null}
# pip install scikit-optimize
from skopt import BayesSearchCV
from skopt.space import Integer, Real, Categorical

# Define search space
search_space = {
    'n_estimators': Integer(50, 300),
    'max_depth': Integer(3, 20),
    'min_samples_split': Integer(2, 15),
    'min_samples_leaf': Integer(1, 10),
    'max_features': Categorical(['sqrt', 'log2', None])
}

# Bayesian search
bayes_search = BayesSearchCV(
    RandomForestClassifier(random_state=42),
    search_space,
    n_iter=50,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    random_state=42,
    verbose=1
)

bayes_search.fit(X_train, y_train)

print(f"Best parameters: {bayes_search.best_params_}")
print(f"Best CV score: {bayes_search.best_score_:.4f}")
```

### How Bayesian Optimization Works

1. Try some random points
2. Build a model of: parameter values → score
3. Use model to find promising regions
4. Evaluate and update model
5. Repeat

**Balances exploration (try new areas) and exploitation (focus on promising areas).**

***

## Optuna: Modern Hyperparameter Tuning

```python theme={null}
# pip install optuna
import optuna
from sklearn.model_selection import cross_val_score

def objective(trial):
    # Suggest hyperparameters
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 20),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 15),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
        'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2', None])
    }
    
    # Create and evaluate model
    model = RandomForestClassifier(**params, random_state=42)
    scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
    
    return scores.mean()

# Run optimization
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50, show_progress_bar=True)

print(f"Best trial: {study.best_trial.params}")
print(f"Best value: {study.best_value:.4f}")

# Visualization
optuna.visualization.plot_optimization_history(study)
optuna.visualization.plot_param_importances(study)
```

***

## Practical Tips

### 1. Start Coarse, Then Refine

```python theme={null}
# Step 1: Coarse search
param_grid_coarse = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 20, None]
}

grid_coarse = GridSearchCV(model, param_grid_coarse, cv=3)
grid_coarse.fit(X_train, y_train)
# Best: n_estimators=100, max_depth=10

# Step 2: Fine search around best
param_grid_fine = {
    'n_estimators': [80, 100, 120],
    'max_depth': [8, 10, 12]
}

grid_fine = GridSearchCV(model, param_grid_fine, cv=5)
grid_fine.fit(X_train, y_train)
```

### 2. Prioritize the Most Impactful Hyperparameters

Not all hyperparameters are created equal. Tune the ones that move the needle most, and leave the rest at sensible defaults.

```python theme={null}
from sklearn.ensemble import GradientBoostingClassifier

# For Gradient Boosting, these three interact heavily and matter most:
# - learning_rate: controls step size (low = more trees needed but better generalization)
# - n_estimators: number of boosting rounds (tied to learning_rate)
# - max_depth: complexity of each tree (usually 3-7 for boosting)
# Rule of thumb: lower learning_rate + more n_estimators = better results, more compute.
param_grid = {
    'n_estimators': [100, 200, 500],
    'learning_rate': [0.01, 0.1, 0.3],
    'max_depth': [3, 5, 7]
}
```

### 3. Different Metrics for Different Problems

```python theme={null}
from sklearn.model_selection import GridSearchCV

# Classification
scoring_classification = ['accuracy', 'f1', 'roc_auc', 'precision', 'recall']

# Use refit to choose final model
grid = GridSearchCV(
    model, 
    param_grid,
    cv=5,
    scoring=scoring_classification,
    refit='f1'  # Final model optimizes for F1
)
```

### 4. Nested Cross-Validation

For unbiased evaluation of the tuning process. This is subtle but important: regular cross-validation with hyperparameter tuning gives you an *optimistically biased* estimate of performance. You picked the best hyperparameters *on the same folds* you're reporting results for. Nested CV fixes this by using separate inner folds for tuning and outer folds for evaluation.

```python theme={null}
from sklearn.model_selection import cross_val_score, GridSearchCV

# Inner loop: tune hyperparameters (picks the best settings)
inner_cv = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5
)

# Outer loop: evaluate the ENTIRE tuning process on held-out data
# This gives you an honest estimate of "if I gave this tuning pipeline
# new data, how well would it perform?"
outer_scores = cross_val_score(inner_cv, X, y, cv=5)
print(f"Nested CV Score: {outer_scores.mean():.4f} (+/- {outer_scores.std():.4f})")
# This score will typically be 1-3% lower than non-nested CV --
# the difference is the "optimism" that regular CV introduces.
```

***

## Common Hyperparameters by Model

<Note>
  **Common ML mistake -- tuning before feature engineering**: Hyperparameter tuning typically yields 1-3% improvement. Good feature engineering yields 5-20%. Always get your features right first, then tune. A perfectly tuned model on bad features will lose to a default model on great features every time.
</Note>

### Random Forest

```python theme={null}
{
    'n_estimators': [100, 200, 500],     # Most important: diminishing returns past ~200
    'max_depth': [5, 10, 15, None],      # Main overfitting control
    'min_samples_split': [2, 5, 10],     # Secondary regularization
    'max_features': ['sqrt', 'log2']     # Controls tree diversity
}
# Tuning priority: max_depth > n_estimators > min_samples_split > max_features
```

### Gradient Boosting / XGBoost

```python theme={null}
{
    'n_estimators': [100, 200, 500],     # More trees + lower LR = better (but slower)
    'learning_rate': [0.01, 0.1, 0.3],  # MOST IMPORTANT -- interacts with n_estimators
    'max_depth': [3, 5, 7],             # Keep shallow (3-7) -- boosting adds depth via iterations
    'subsample': [0.8, 1.0],            # Row sampling -- adds regularization
    'colsample_bytree': [0.8, 1.0]     # XGBoost column sampling -- like max_features for RF
}
# Key insight: learning_rate and n_estimators are coupled.
# Lower LR needs more estimators. Start with LR=0.1, n=100.
```

### SVM

```python theme={null}
{
    'C': [0.1, 1, 10, 100],             # Regularization strength (inverse)
    'gamma': ['scale', 'auto', 0.1, 1], # RBF kernel width -- most common overfitting cause
    'kernel': ['rbf', 'poly']            # RBF is the default starting point
}
# WARNING: SVM tuning is O(n^2) in dataset size. For >50K samples,
# consider LinearSVC with just C, or switch to tree-based models.
```

### Neural Networks

```python theme={null}
{
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)],
    'alpha': [0.0001, 0.001, 0.01],
    'learning_rate_init': [0.001, 0.01]
}
```

***

## 🚀 Mini Projects

<CardGroup cols={2}>
  <Card title="Project 1: Search Strategy Comparison" icon="magnifying-glass">
    Compare Grid, Random, and Bayesian search
  </Card>

  <Card title="Project 2: Learning Curve Analyzer" icon="chart-line">
    Diagnose underfitting vs overfitting with tuning
  </Card>

  <Card title="Project 3: Custom Hyperparameter Optimizer" icon="code">
    Build your own optimization algorithm
  </Card>

  <Card title="Project 4: Auto-ML Mini Framework" icon="robot">
    Create an automated model tuning system
  </Card>
</CardGroup>

### Project 1: Search Strategy Comparison

Compare different hyperparameter search strategies on the same problem.

<details>
  <summary>View Complete Solution</summary>

  ```python theme={null}
  import numpy as np
  import matplotlib.pyplot as plt
  from sklearn.datasets import load_breast_cancer
  from sklearn.ensemble import RandomForestClassifier
  from sklearn.model_selection import (GridSearchCV, RandomizedSearchCV, 
                                       cross_val_score, train_test_split)
  from scipy.stats import randint, uniform
  import time

  # Load data
  cancer = load_breast_cancer()
  X, y = cancer.data, cancer.target
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  # Define parameter space
  param_grid = {
      'n_estimators': [50, 100, 200, 300],
      'max_depth': [5, 10, 15, 20, None],
      'min_samples_split': [2, 5, 10],
      'min_samples_leaf': [1, 2, 4]
  }

  param_random = {
      'n_estimators': randint(50, 300),
      'max_depth': [5, 10, 15, 20, None],
      'min_samples_split': randint(2, 20),
      'min_samples_leaf': randint(1, 10)
  }

  # Track results
  results = {}

  # Step 1: Grid Search
  print("Running Grid Search...")
  start = time.time()
  grid_search = GridSearchCV(
      RandomForestClassifier(random_state=42),
      param_grid,
      cv=5,
      scoring='accuracy',
      n_jobs=-1
  )
  grid_search.fit(X_train, y_train)
  grid_time = time.time() - start

  results['Grid Search'] = {
      'best_score': grid_search.best_score_,
      'test_score': grid_search.score(X_test, y_test),
      'time': grid_time,
      'n_iterations': len(grid_search.cv_results_['mean_test_score']),
      'best_params': grid_search.best_params_
  }
  print(f"  Time: {grid_time:.2f}s, Score: {grid_search.best_score_:.4f}")

  # Step 2: Random Search (same iterations as grid)
  n_iter = 60  # Same number as grid search combinations
  print(f"\nRunning Random Search ({n_iter} iterations)...")
  start = time.time()
  random_search = RandomizedSearchCV(
      RandomForestClassifier(random_state=42),
      param_random,
      n_iter=n_iter,
      cv=5,
      scoring='accuracy',
      n_jobs=-1,
      random_state=42
  )
  random_search.fit(X_train, y_train)
  random_time = time.time() - start

  results['Random Search'] = {
      'best_score': random_search.best_score_,
      'test_score': random_search.score(X_test, y_test),
      'time': random_time,
      'n_iterations': n_iter,
      'best_params': random_search.best_params_
  }
  print(f"  Time: {random_time:.2f}s, Score: {random_search.best_score_:.4f}")

  # Step 3: Random Search with fewer iterations
  n_iter_small = 20
  print(f"\nRunning Random Search ({n_iter_small} iterations)...")
  start = time.time()
  random_search_small = RandomizedSearchCV(
      RandomForestClassifier(random_state=42),
      param_random,
      n_iter=n_iter_small,
      cv=5,
      scoring='accuracy',
      n_jobs=-1,
      random_state=42
  )
  random_search_small.fit(X_train, y_train)
  random_small_time = time.time() - start

  results['Random Search (20)'] = {
      'best_score': random_search_small.best_score_,
      'test_score': random_search_small.score(X_test, y_test),
      'time': random_small_time,
      'n_iterations': n_iter_small,
      'best_params': random_search_small.best_params_
  }
  print(f"  Time: {random_small_time:.2f}s, Score: {random_search_small.best_score_:.4f}")

  # Step 4: Comparison visualization
  fig, axes = plt.subplots(1, 3, figsize=(14, 4))

  # Score comparison
  ax1 = axes[0]
  names = list(results.keys())
  scores = [results[n]['test_score'] for n in names]
  ax1.bar(names, scores, color=['steelblue', 'coral', 'green'])
  ax1.set_ylabel('Test Score')
  ax1.set_title('Test Accuracy Comparison')
  ax1.set_ylim(0.9, 1.0)
  for i, v in enumerate(scores):
      ax1.text(i, v + 0.002, f'{v:.4f}', ha='center')

  # Time comparison
  ax2 = axes[1]
  times = [results[n]['time'] for n in names]
  ax2.bar(names, times, color=['steelblue', 'coral', 'green'])
  ax2.set_ylabel('Time (seconds)')
  ax2.set_title('Search Time Comparison')
  for i, v in enumerate(times):
      ax2.text(i, v + 0.5, f'{v:.1f}s', ha='center')

  # Efficiency (score per second)
  ax3 = axes[2]
  efficiency = [results[n]['test_score'] / results[n]['time'] for n in names]
  ax3.bar(names, efficiency, color=['steelblue', 'coral', 'green'])
  ax3.set_ylabel('Score / Second')
  ax3.set_title('Search Efficiency')

  plt.tight_layout()
  plt.savefig('search_comparison.png', dpi=150)

  # Print summary
  print("\n" + "="*60)
  print("📊 SEARCH STRATEGY COMPARISON")
  print("="*60)
  for name, res in results.items():
      print(f"\n{name}:")
      print(f"  Iterations:  {res['n_iterations']}")
      print(f"  Time:        {res['time']:.2f}s")
      print(f"  CV Score:    {res['best_score']:.4f}")
      print(f"  Test Score:  {res['test_score']:.4f}")
      print(f"  Best params: {res['best_params']}")
  ```

  **What you learned:**

  * Random search is often more efficient than grid search
  * Fewer random iterations can match exhaustive grid search
  * The parameter space size dramatically affects search time
</details>

### Project 2: Learning Curve Analyzer

Use learning curves to determine if more data or different hyperparameters would help.

<details>
  <summary>View Complete Solution</summary>

  ```python theme={null}
  import numpy as np
  import matplotlib.pyplot as plt
  from sklearn.datasets import load_breast_cancer
  from sklearn.ensemble import RandomForestClassifier
  from sklearn.model_selection import learning_curve, validation_curve
  from sklearn.preprocessing import StandardScaler

  # Load data
  cancer = load_breast_cancer()
  X, y = cancer.data, cancer.target

  # Step 1: Generate learning curves for different model complexities
  def plot_learning_curves(estimator, X, y, title, ax):
      """Plot learning curve for a model"""
      train_sizes, train_scores, val_scores = learning_curve(
          estimator, X, y,
          train_sizes=np.linspace(0.1, 1.0, 10),
          cv=5,
          scoring='accuracy',
          n_jobs=-1
      )
      
      train_mean = train_scores.mean(axis=1)
      train_std = train_scores.std(axis=1)
      val_mean = val_scores.mean(axis=1)
      val_std = val_scores.std(axis=1)
      
      ax.plot(train_sizes, train_mean, 'b-', label='Training')
      ax.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1)
      ax.plot(train_sizes, val_mean, 'r-', label='Validation')
      ax.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1)
      ax.set_xlabel('Training Size')
      ax.set_ylabel('Score')
      ax.set_title(title)
      ax.legend()
      ax.set_ylim(0.8, 1.02)
      
      # Diagnose
      final_train = train_mean[-1]
      final_val = val_mean[-1]
      gap = final_train - final_val
      
      if final_train < 0.95 and gap < 0.05:
          diagnosis = "UNDERFITTING"
      elif gap > 0.1:
          diagnosis = "OVERFITTING"
      else:
          diagnosis = "GOOD FIT"
      
      return diagnosis

  # Step 2: Compare different model complexities
  fig, axes = plt.subplots(2, 2, figsize=(12, 10))

  models = [
      (RandomForestClassifier(n_estimators=10, max_depth=2, random_state=42), 
       "Simple Model (depth=2, trees=10)"),
      (RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42), 
       "Medium Model (depth=5, trees=100)"),
      (RandomForestClassifier(n_estimators=100, max_depth=20, random_state=42), 
       "Complex Model (depth=20, trees=100)"),
      (RandomForestClassifier(n_estimators=200, max_depth=None, random_state=42), 
       "Very Complex (no depth limit)")
  ]

  print("Learning Curve Analysis")
  print("="*50)
  for i, (model, title) in enumerate(models):
      ax = axes[i // 2, i % 2]
      diagnosis = plot_learning_curves(model, X, y, title, ax)
      print(f"{title}: {diagnosis}")

  plt.tight_layout()
  plt.savefig('learning_curves.png', dpi=150)

  # Step 3: Validation curves for key hyperparameters
  fig, axes = plt.subplots(1, 3, figsize=(15, 4))

  # n_estimators
  param_range = [10, 25, 50, 100, 200, 300]
  train_scores, val_scores = validation_curve(
      RandomForestClassifier(max_depth=10, random_state=42),
      X, y, param_name='n_estimators',
      param_range=param_range, cv=5, n_jobs=-1
  )
  axes[0].plot(param_range, train_scores.mean(axis=1), 'b-', label='Train')
  axes[0].plot(param_range, val_scores.mean(axis=1), 'r-', label='Val')
  axes[0].set_xlabel('n_estimators')
  axes[0].set_ylabel('Score')
  axes[0].set_title('Effect of n_estimators')
  axes[0].legend()

  # max_depth
  param_range = [2, 5, 10, 15, 20, 25, None]
  param_range_plot = [2, 5, 10, 15, 20, 25, 30]  # For plotting
  train_scores, val_scores = validation_curve(
      RandomForestClassifier(n_estimators=100, random_state=42),
      X, y, param_name='max_depth',
      param_range=param_range, cv=5, n_jobs=-1
  )
  axes[1].plot(param_range_plot, train_scores.mean(axis=1), 'b-', label='Train')
  axes[1].plot(param_range_plot, val_scores.mean(axis=1), 'r-', label='Val')
  axes[1].set_xlabel('max_depth')
  axes[1].set_ylabel('Score')
  axes[1].set_title('Effect of max_depth')
  axes[1].legend()

  # min_samples_leaf
  param_range = [1, 2, 4, 8, 16, 32]
  train_scores, val_scores = validation_curve(
      RandomForestClassifier(n_estimators=100, random_state=42),
      X, y, param_name='min_samples_leaf',
      param_range=param_range, cv=5, n_jobs=-1
  )
  axes[2].plot(param_range, train_scores.mean(axis=1), 'b-', label='Train')
  axes[2].plot(param_range, val_scores.mean(axis=1), 'r-', label='Val')
  axes[2].set_xlabel('min_samples_leaf')
  axes[2].set_ylabel('Score')
  axes[2].set_title('Effect of min_samples_leaf')
  axes[2].legend()

  plt.tight_layout()
  plt.savefig('validation_curves.png', dpi=150)

  print("\n📋 Recommendations:")
  print("  - Increase model complexity if training score is low")
  print("  - Add regularization if gap between train/val is large")
  print("  - Get more data if validation score doesn't improve with training size")
  ```

  **What you learned:**

  * Learning curves diagnose underfitting vs overfitting
  * Validation curves show the effect of individual hyperparameters
  * The train-validation gap indicates model complexity issues
</details>

### Project 3: Custom Hyperparameter Optimizer

Build a simple Bayesian-style optimizer from scratch.

<details>
  <summary>View Complete Solution</summary>

  ```python theme={null}
  import numpy as np
  import matplotlib.pyplot as plt
  from sklearn.datasets import load_breast_cancer
  from sklearn.ensemble import RandomForestClassifier
  from sklearn.model_selection import cross_val_score
  from scipy.stats import norm
  from scipy.optimize import minimize

  # Load data
  cancer = load_breast_cancer()
  X, y = cancer.data, cancer.target

  # Step 1: Define the objective function
  def objective(params):
      """Objective function to maximize (returns negative for minimization)"""
      n_estimators, max_depth = params
      n_estimators = int(n_estimators)
      max_depth = int(max_depth)
      
      model = RandomForestClassifier(
          n_estimators=n_estimators,
          max_depth=max_depth,
          random_state=42
      )
      
      scores = cross_val_score(model, X, y, cv=3, scoring='accuracy')
      return -scores.mean()  # Negative because we minimize

  # Step 2: Define a simple Gaussian Process surrogate
  class SimpleGPSurrogate:
      """Simple Gaussian Process for Bayesian Optimization"""
      
      def __init__(self, length_scale=1.0):
          self.X_train = None
          self.y_train = None
          self.length_scale = length_scale
      
      def _kernel(self, X1, X2):
          """RBF kernel"""
          dist = np.sum((X1[:, np.newaxis, :] - X2[np.newaxis, :, :]) ** 2, axis=2)
          return np.exp(-dist / (2 * self.length_scale ** 2))
      
      def fit(self, X, y):
          """Fit the GP to data"""
          self.X_train = np.array(X)
          self.y_train = np.array(y)
      
      def predict(self, X):
          """Predict mean and variance"""
          X = np.array(X)
          if len(X.shape) == 1:
              X = X.reshape(1, -1)
          
          K = self._kernel(self.X_train, self.X_train) + 1e-6 * np.eye(len(self.X_train))
          K_star = self._kernel(X, self.X_train)
          K_inv = np.linalg.inv(K)
          
          mu = K_star @ K_inv @ self.y_train
          var = 1 - np.diag(K_star @ K_inv @ K_star.T)
          var = np.maximum(var, 1e-6)  # Ensure positive
          
          return mu, np.sqrt(var)

  # Step 3: Expected Improvement acquisition function
  def expected_improvement(X, gp, y_best):
      """Calculate expected improvement"""
      mu, sigma = gp.predict(X)
      
      improvement = y_best - mu  # Note: we're minimizing
      Z = improvement / sigma
      ei = improvement * norm.cdf(Z) + sigma * norm.pdf(Z)
      
      return ei

  # Step 4: Bayesian Optimization
  def bayesian_optimization(objective, bounds, n_init=5, n_iter=20):
      """Simple Bayesian Optimization"""
      
      # Initialize with random samples
      X_samples = []
      y_samples = []
      
      print("Initializing with random samples...")
      for _ in range(n_init):
          x = [np.random.uniform(b[0], b[1]) for b in bounds]
          y = objective(x)
          X_samples.append(x)
          y_samples.append(y)
          print(f"  n_est={int(x[0])}, depth={int(x[1])}, score={-y:.4f}")
      
      # Create GP surrogate
      gp = SimpleGPSurrogate(length_scale=10.0)
      
      print("\nOptimizing...")
      for i in range(n_iter):
          # Fit GP
          gp.fit(X_samples, y_samples)
          y_best = min(y_samples)
          
          # Find next point to sample (maximize EI)
          best_ei = -np.inf
          best_x = None
          
          # Random search over acquisition function
          for _ in range(100):
              x_candidate = [np.random.uniform(b[0], b[1]) for b in bounds]
              ei = expected_improvement(np.array([x_candidate]), gp, y_best)[0]
              if ei > best_ei:
                  best_ei = ei
                  best_x = x_candidate
          
          # Evaluate objective at best point
          y_new = objective(best_x)
          X_samples.append(best_x)
          y_samples.append(y_new)
          
          print(f"Iter {i+1}: n_est={int(best_x[0])}, depth={int(best_x[1])}, "
                f"score={-y_new:.4f}, best={-min(y_samples):.4f}")
      
      # Return best result
      best_idx = np.argmin(y_samples)
      return X_samples[best_idx], -y_samples[best_idx], X_samples, y_samples

  # Step 5: Run optimization
  bounds = [(10, 200), (2, 20)]  # n_estimators, max_depth
  best_params, best_score, all_X, all_y = bayesian_optimization(
      objective, bounds, n_init=5, n_iter=15
  )

  print(f"\n🏆 Best Parameters:")
  print(f"  n_estimators: {int(best_params[0])}")
  print(f"  max_depth: {int(best_params[1])}")
  print(f"  CV Score: {best_score:.4f}")

  # Step 6: Visualize optimization progress
  fig, axes = plt.subplots(1, 2, figsize=(12, 4))

  # Convergence plot
  ax1 = axes[0]
  best_so_far = [min(all_y[:i+1]) for i in range(len(all_y))]
  ax1.plot(-np.array(best_so_far), 'b-', marker='o')
  ax1.set_xlabel('Iteration')
  ax1.set_ylabel('Best Score')
  ax1.set_title('Optimization Convergence')

  # Parameter space exploration
  ax2 = axes[1]
  all_X = np.array(all_X)
  all_y_plot = -np.array(all_y)
  scatter = ax2.scatter(all_X[:, 0], all_X[:, 1], c=all_y_plot, cmap='viridis', s=100)
  ax2.scatter([best_params[0]], [best_params[1]], c='red', s=200, marker='*', 
              label='Best', edgecolors='black', linewidths=2)
  ax2.set_xlabel('n_estimators')
  ax2.set_ylabel('max_depth')
  ax2.set_title('Parameter Space Exploration')
  ax2.legend()
  plt.colorbar(scatter, ax=ax2, label='Score')

  plt.tight_layout()
  plt.savefig('bayesian_optimization.png', dpi=150)
  ```

  **What you learned:**

  * Bayesian optimization uses a surrogate model (GP) to guide search
  * Expected Improvement balances exploration and exploitation
  * Smarter search can find good hyperparameters with fewer evaluations
</details>

### Project 4: Auto-ML Mini Framework

Create an automated model tuning system that handles multiple models.

<details>
  <summary>View Complete Solution</summary>

  ```python theme={null}
  import numpy as np
  import pandas as pd
  from sklearn.datasets import load_breast_cancer
  from sklearn.model_selection import train_test_split, cross_val_score, RandomizedSearchCV
  from sklearn.preprocessing import StandardScaler
  from sklearn.pipeline import Pipeline
  from sklearn.linear_model import LogisticRegression
  from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
  from sklearn.svm import SVC
  from sklearn.neighbors import KNeighborsClassifier
  from scipy.stats import randint, uniform, loguniform
  import time
  import warnings
  warnings.filterwarnings('ignore')

  # Step 1: Define AutoML class
  class MiniAutoML:
      """A mini AutoML framework for classification"""
      
      def __init__(self, time_budget=60, random_state=42):
          self.time_budget = time_budget
          self.random_state = random_state
          self.results = []
          self.best_model = None
          self.best_score = 0
          
          # Define models and their hyperparameter spaces
          self.model_configs = {
              'LogisticRegression': {
                  'model': LogisticRegression(random_state=random_state, max_iter=1000),
                  'params': {
                      'clf__C': loguniform(1e-3, 100),
                      'clf__penalty': ['l1', 'l2'],
                      'clf__solver': ['saga']
                  },
                  'needs_scaling': True
              },
              'RandomForest': {
                  'model': RandomForestClassifier(random_state=random_state),
                  'params': {
                      'clf__n_estimators': randint(50, 300),
                      'clf__max_depth': [5, 10, 15, 20, None],
                      'clf__min_samples_split': randint(2, 20),
                      'clf__min_samples_leaf': randint(1, 10)
                  },
                  'needs_scaling': False
              },
              'GradientBoosting': {
                  'model': GradientBoostingClassifier(random_state=random_state),
                  'params': {
                      'clf__n_estimators': randint(50, 200),
                      'clf__max_depth': [3, 5, 7],
                      'clf__learning_rate': loguniform(0.01, 0.3),
                      'clf__subsample': uniform(0.6, 0.4)
                  },
                  'needs_scaling': False
              },
              'SVM': {
                  'model': SVC(random_state=random_state),
                  'params': {
                      'clf__C': loguniform(0.1, 100),
                      'clf__gamma': ['scale', 'auto'],
                      'clf__kernel': ['rbf', 'poly']
                  },
                  'needs_scaling': True
              },
              'KNN': {
                  'model': KNeighborsClassifier(),
                  'params': {
                      'clf__n_neighbors': randint(3, 20),
                      'clf__weights': ['uniform', 'distance'],
                      'clf__metric': ['euclidean', 'manhattan']
                  },
                  'needs_scaling': True
              }
          }
      
      def _create_pipeline(self, model, needs_scaling):
          """Create a pipeline with optional scaling"""
          if needs_scaling:
              return Pipeline([
                  ('scaler', StandardScaler()),
                  ('clf', model)
              ])
          return Pipeline([('clf', model)])
      
      def fit(self, X, y, cv=5):
          """Run AutoML to find best model"""
          start_time = time.time()
          time_per_model = self.time_budget / len(self.model_configs)
          
          print("🤖 MiniAutoML Starting...")
          print(f"   Time budget: {self.time_budget}s")
          print(f"   Models to try: {len(self.model_configs)}")
          print("="*60)
          
          for name, config in self.model_configs.items():
              model_start = time.time()
              elapsed = time.time() - start_time
              
              if elapsed > self.time_budget:
                  print(f"\n⏰ Time budget exceeded. Stopping.")
                  break
              
              print(f"\n📦 Tuning {name}...")
              
              pipeline = self._create_pipeline(
                  config['model'], 
                  config['needs_scaling']
              )
              
              # Estimate iterations based on time
              n_iter = min(20, int(time_per_model / 2))
              
              try:
                  search = RandomizedSearchCV(
                      pipeline,
                      config['params'],
                      n_iter=n_iter,
                      cv=cv,
                      scoring='accuracy',
                      n_jobs=-1,
                      random_state=self.random_state
                  )
                  
                  search.fit(X, y)
                  
                  result = {
                      'model': name,
                      'best_score': search.best_score_,
                      'best_params': search.best_params_,
                      'time': time.time() - model_start,
                      'estimator': search.best_estimator_
                  }
                  self.results.append(result)
                  
                  print(f"   Score: {result['best_score']:.4f}")
                  print(f"   Time:  {result['time']:.1f}s")
                  
                  if result['best_score'] > self.best_score:
                      self.best_score = result['best_score']
                      self.best_model = search.best_estimator_
                      
              except Exception as e:
                  print(f"   ❌ Error: {e}")
          
          print("\n" + "="*60)
          print("🏆 AutoML Complete!")
          print(f"   Total time: {time.time() - start_time:.1f}s")
          
          return self
      
      def get_leaderboard(self):
          """Return ranked results"""
          df = pd.DataFrame(self.results)
          return df.sort_values('best_score', ascending=False).reset_index(drop=True)
      
      def predict(self, X):
          """Predict using best model"""
          return self.best_model.predict(X)

  # Step 2: Run AutoML
  cancer = load_breast_cancer()
  X, y = cancer.data, cancer.target
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  automl = MiniAutoML(time_budget=120, random_state=42)
  automl.fit(X_train, y_train)

  # Step 3: View results
  print("\n📊 LEADERBOARD")
  print("-"*50)
  leaderboard = automl.get_leaderboard()
  for i, row in leaderboard.iterrows():
      print(f"{i+1}. {row['model']:20s} - Score: {row['best_score']:.4f}")

  # Step 4: Evaluate on test set
  test_score = automl.best_model.score(X_test, y_test)
  print(f"\n🎯 Test Set Performance:")
  print(f"   Best model: {leaderboard.iloc[0]['model']}")
  print(f"   CV Score:   {leaderboard.iloc[0]['best_score']:.4f}")
  print(f"   Test Score: {test_score:.4f}")

  # Step 5: Show best parameters
  print(f"\n⚙️ Best Parameters:")
  best_params = leaderboard.iloc[0]['best_params']
  for param, value in best_params.items():
      print(f"   {param}: {value}")
  ```

  **What you learned:**

  * AutoML automates model selection and hyperparameter tuning
  * Time budgets help manage computational resources
  * Pipelines ensure consistent preprocessing across models
</details>

***

## Key Takeaways

<CardGroup cols={2}>
  <Card title="Grid Search" icon="grid">
    Exhaustive but slow. Good for small spaces.
  </Card>

  <Card title="Random Search" icon="shuffle">
    Often better than grid. Use for larger spaces.
  </Card>

  <Card title="Bayesian Optimization" icon="brain">
    Smart search. Best for expensive evaluations.
  </Card>

  <Card title="Nested CV" icon="layer-group">
    Unbiased estimate of tuning performance.
  </Card>
</CardGroup>

***

## What's Next?

You've learned individual algorithms. Now let's see how to tackle real-world ML projects end-to-end!

<Card title="Continue to Module 10: End-to-End ML Project" icon="arrow-right" href="/courses/ml-mastery/10-end-to-end-project">
  Apply everything in a complete machine learning project
</Card>
