Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Time Series Forecasting

Time Series Components and Forecasting

Time Changes Everything

Most ML algorithms assume observations are independent — like pulling marbles from a bag, where each draw tells you nothing about the next. But what if tomorrow’s stock price depends on today’s? What if next month’s sales follow a seasonal pattern? Think of it this way: regular ML is like analyzing a bag of photos — the order does not matter. Time series is like analyzing a movie — shuffle the frames and you lose all meaning. Welcome to Time Series — where order matters.
Amazon Demand Forecasting

The Restaurant Sales Problem

You manage a restaurant chain. You need to predict:
  • How many customers next week?
  • How much inventory to order?
  • How many staff to schedule?
You have 3 years of daily sales data. Let’s find the patterns!

Components of Time Series

Every time series can be decomposed into components:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

# Generate sample time series data
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=730, freq='D')

# Create components
trend = np.linspace(100, 200, 730)  # Upward trend
seasonal = 30 * np.sin(2 * np.pi * np.arange(730) / 365)  # Yearly seasonality
noise = np.random.normal(0, 10, 730)  # Random noise

sales = trend + seasonal + noise
ts = pd.Series(sales, index=dates)

# Decompose
decomposition = seasonal_decompose(ts, model='additive', period=365)

# Plot
fig, axes = plt.subplots(4, 1, figsize=(12, 10))

axes[0].plot(ts)
axes[0].set_title('Original Time Series')

axes[1].plot(decomposition.trend)
axes[1].set_title('Trend Component')

axes[2].plot(decomposition.seasonal)
axes[2].set_title('Seasonal Component')

axes[3].plot(decomposition.resid)
axes[3].set_title('Residual (Noise)')

plt.tight_layout()
plt.show()

The Four Components

Think of a time series like listening to music. The trend is the overall volume getting louder or quieter across the song. Seasonality is the repeating beat pattern. Cyclical patterns are the chorus coming back, but not at perfectly fixed intervals. And noise is the static in the recording.
ComponentDescriptionExample
TrendLong-term directionGrowing customer base
SeasonalityRegular, fixed-period patternsSummer ice cream sales, Monday dips in retail
CyclicalIrregular long patterns (no fixed period)Economic cycles, housing market booms
NoiseRandom variation you cannot predictWeather effects, one-off events
Practical tip: Seasonality has a fixed, known period (e.g., 7 days, 12 months). Cyclical patterns do not. This distinction matters because ARIMA-family models handle seasonality with explicit seasonal terms, but cyclical patterns require longer history and more flexible models.

Stationarity: The Foundation

Most time series methods require stationarity — the statistical properties (mean, variance, autocorrelation) do not change over time.
Stationarity is like a river that flows at a constant average speed and depth. Some days it is a bit faster, some days slower, but the long-run behavior stays the same. A non-stationary series is like a river during flood season — the average keeps rising, so yesterday’s “normal” level is not useful for predicting tomorrow’s.

Testing for Stationarity

from statsmodels.tsa.stattools import adfuller

def test_stationarity(series, name="Series"):
    """Test if series is stationary using Augmented Dickey-Fuller test."""
    result = adfuller(series.dropna())
    
    print(f"\n{name} - Stationarity Test:")
    print(f"  ADF Statistic: {result[0]:.4f}")
    print(f"  p-value: {result[1]:.4f}")
    print(f"  Stationary: {'Yes' if result[1] < 0.05 else 'No'}")
    
    return result[1] < 0.05

# Test our series
test_stationarity(ts, "Raw Sales")

# Make it stationary through differencing
ts_diff = ts.diff().dropna()
test_stationarity(ts_diff, "Differenced Sales")

Making Data Stationary

# Differencing: subtract previous value to remove trend
# If today's value is 105 and yesterday's was 100, the differenced value is 5
ts_diff = ts.diff()

# Log transform: stabilize variance when fluctuations grow with the level
# (e.g., stock prices: a $10 swing means more at $50 than at $500)
ts_log = np.log(ts)

# Both combined: handles both growing variance AND trend
ts_log_diff = np.log(ts).diff()

# Plot transformations
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].plot(ts)
axes[0, 0].set_title('Original')

axes[0, 1].plot(ts_diff)
axes[0, 1].set_title('Differenced')

axes[1, 0].plot(ts_log)
axes[1, 0].set_title('Log Transformed')

axes[1, 1].plot(ts_log_diff)
axes[1, 1].set_title('Log + Differenced')

plt.tight_layout()
plt.show()

Simple Forecasting Methods

1. Moving Average

def moving_average_forecast(series, window=7):
    """Forecast using simple moving average."""
    return series.rolling(window=window).mean()

# Calculate 7-day and 30-day moving averages
ma_7 = moving_average_forecast(ts, 7)
ma_30 = moving_average_forecast(ts, 30)

plt.figure(figsize=(12, 6))
plt.plot(ts.values[-100:], label='Actual', alpha=0.7)
plt.plot(ma_7.values[-100:], label='7-day MA', linewidth=2)
plt.plot(ma_30.values[-100:], label='30-day MA', linewidth=2)
plt.legend()
plt.title('Moving Average Smoothing')
plt.show()

2. Exponential Smoothing

from statsmodels.tsa.holtwinters import SimpleExpSmoothing, ExponentialSmoothing

# Simple Exponential Smoothing
ses = SimpleExpSmoothing(ts).fit(smoothing_level=0.3)
ses_forecast = ses.forecast(30)

# Holt-Winters (handles trend + seasonality)
hw = ExponentialSmoothing(
    ts, 
    trend='add', 
    seasonal='add', 
    seasonal_periods=365
).fit()
hw_forecast = hw.forecast(30)

# Plot
plt.figure(figsize=(12, 6))
plt.plot(ts.values[-100:], label='Historical')
plt.plot(range(100, 130), ses_forecast, label='Simple ES', linestyle='--')
plt.plot(range(100, 130), hw_forecast, label='Holt-Winters', linestyle='--')
plt.axvline(99, color='gray', linestyle=':')
plt.legend()
plt.title('Exponential Smoothing Forecasts')
plt.show()

ARIMA: The Classic Approach

ARIMA = AutoRegressive Integrated Moving Average Think of ARIMA as three tools combined into one Swiss Army knife:
  • AR(p) — AutoRegressive: “The future looks like the recent past.” Uses p past values as predictors. If p=3, the model says: “To predict tomorrow, look at the last 3 days.”
  • I(d) — Integrated: “Difference the data d times to make it stationary.” If d=1, you work with day-over-day changes rather than raw values.
  • MA(q) — Moving Average: “Learn from recent forecast mistakes.” Uses q past forecast errors. If q=2, the model adjusts based on how wrong it was yesterday and the day before.
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Plot ACF and PACF to determine p and q
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(ts_diff.dropna(), ax=axes[0], lags=40)
plot_pacf(ts_diff.dropna(), ax=axes[1], lags=40)
plt.tight_layout()
plt.show()

# Fit ARIMA model
# Train on first 700 days, test on last 30
train = ts[:700]
test = ts[700:]

model = ARIMA(train, order=(1, 1, 1))
fitted = model.fit()

print(fitted.summary())

# Forecast
forecast = fitted.forecast(len(test))

# Plot
plt.figure(figsize=(12, 6))
plt.plot(train.values[-100:], label='Training')
plt.plot(range(100, 100+len(test)), test.values, label='Actual')
plt.plot(range(100, 100+len(forecast)), forecast.values, label='Forecast', linestyle='--')
plt.axvline(99, color='gray', linestyle=':')
plt.legend()
plt.title('ARIMA Forecast')
plt.show()

Auto ARIMA

# Using pmdarima for automatic order selection
# pip install pmdarima
from pmdarima import auto_arima

auto_model = auto_arima(
    train,
    start_p=0, start_q=0,
    max_p=5, max_q=5,
    d=None,  # Let it determine d
    seasonal=True,
    m=7,  # Weekly seasonality
    trace=True,
    error_action='ignore',
    suppress_warnings=True
)

print(auto_model.summary())

SARIMA: Adding Seasonality

SARIMA = Seasonal ARIMA with additional seasonal components:
from statsmodels.tsa.statespace.sarimax import SARIMAX

# SARIMA with weekly seasonality
model = SARIMAX(
    train,
    order=(1, 1, 1),           # (p, d, q)
    seasonal_order=(1, 1, 1, 7) # (P, D, Q, m) m=7 for weekly
)

fitted = model.fit(disp=False)
forecast = fitted.forecast(len(test))

# Calculate RMSE
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(test, forecast))
print(f"SARIMA RMSE: {rmse:.2f}")

Prophet: Modern Forecasting

Facebook’s Prophet handles trends, seasonality, and holidays automatically:
# pip install prophet
from prophet import Prophet

# Prepare data in Prophet format
df = pd.DataFrame({
    'ds': dates,
    'y': sales
})

# Train/test split
train_df = df[:700]
test_df = df[700:]

# Fit Prophet
prophet = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False
)
prophet.fit(train_df)

# Make future dataframe
future = prophet.make_future_dataframe(periods=30)
forecast = prophet.predict(future)

# Plot
prophet.plot(forecast)
plt.title('Prophet Forecast')
plt.show()

# Plot components
prophet.plot_components(forecast)
plt.show()

Adding Holidays

# Define holidays
holidays = pd.DataFrame({
    'holiday': 'christmas',
    'ds': pd.to_datetime(['2020-12-25', '2021-12-25', '2022-12-25']),
    'lower_window': -2,  # 2 days before
    'upper_window': 1    # 1 day after
})

prophet = Prophet(holidays=holidays)

ML Approach: Feature Engineering for Time Series

Transform time series into supervised learning:
def create_features(df, target_col, n_lags=7):
    """Create features from time series for ML models."""
    df = df.copy()
    
    # Lag features
    for i in range(1, n_lags + 1):
        df[f'lag_{i}'] = df[target_col].shift(i)
    
    # Rolling statistics
    df['rolling_mean_7'] = df[target_col].rolling(7).mean()
    df['rolling_std_7'] = df[target_col].rolling(7).std()
    df['rolling_mean_30'] = df[target_col].rolling(30).mean()
    
    # Date features
    if isinstance(df.index, pd.DatetimeIndex):
        df['day_of_week'] = df.index.dayofweek
        df['month'] = df.index.month
        df['day_of_month'] = df.index.day
        df['is_weekend'] = (df.index.dayofweek >= 5).astype(int)
    
    return df.dropna()

# Prepare dataset
ts_df = pd.DataFrame({'sales': sales}, index=dates)
ts_ml = create_features(ts_df, 'sales', n_lags=7)

# Features and target
feature_cols = [col for col in ts_ml.columns if col != 'sales']
X = ts_ml[feature_cols]
y = ts_ml['sales']

# Time series split (don't shuffle!)
split_idx = int(len(X) * 0.8)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]

# Train models
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

models = {
    'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42),
    'Gradient Boosting': GradientBoostingRegressor(n_estimators=100, random_state=42)
}

for name, model in models.items():
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    
    mae = mean_absolute_error(y_test, predictions)
    rmse = np.sqrt(mean_squared_error(y_test, predictions))
    
    print(f"{name}: MAE={mae:.2f}, RMSE={rmse:.2f}")

Cross-Validation for Time Series

Never use random cross-validation for time series! Future data would leak into training.
from sklearn.model_selection import TimeSeriesSplit

# Time series cross-validation
tscv = TimeSeriesSplit(n_splits=5)

# Visualize the splits
plt.figure(figsize=(12, 4))
for i, (train_idx, test_idx) in enumerate(tscv.split(X)):
    plt.plot(train_idx, [i] * len(train_idx), 'b-', linewidth=3, label='Train' if i==0 else '')
    plt.plot(test_idx, [i] * len(test_idx), 'r-', linewidth=3, label='Test' if i==0 else '')
plt.xlabel('Sample index')
plt.ylabel('CV iteration')
plt.legend()
plt.title('Time Series Cross-Validation')
plt.show()

# Use in model evaluation
from sklearn.model_selection import cross_val_score

rf = RandomForestRegressor(n_estimators=100, random_state=42)
scores = cross_val_score(rf, X, y, cv=tscv, scoring='neg_mean_squared_error')
rmse_scores = np.sqrt(-scores)
print(f"CV RMSE: {rmse_scores.mean():.2f} ± {rmse_scores.std():.2f}")

Multi-Step Forecasting

Strategy 1: Recursive (Autoregressive)

Predict one step, feed that prediction back in as input to predict the next step. This is like a weather forecaster who uses today’s forecast to make tomorrow’s — errors compound over time, so accuracy degrades at longer horizons.
def recursive_forecast(model, X_last, n_steps, n_lags):
    """Recursively forecast multiple steps."""
    forecasts = []
    current_features = X_last.copy()
    
    for _ in range(n_steps):
        pred = model.predict(current_features.reshape(1, -1))[0]
        forecasts.append(pred)
        
        # Shift lag features
        for i in range(n_lags - 1, 0, -1):
            current_features[i] = current_features[i-1]
        current_features[0] = pred  # lag_1 = prediction
    
    return forecasts

Strategy 2: Direct

Train separate models for each horizon. This avoids the error-compounding problem of recursive forecasting — each model is independently optimized for its specific target distance. The tradeoff: you need to train and maintain multiple models.
def direct_forecast(df, target_col, horizons=[1, 7, 14]):
    """Train separate models for each forecast horizon."""
    models = {}
    
    for h in horizons:
        # Create target h steps ahead
        df_h = df.copy()
        df_h['target'] = df_h[target_col].shift(-h)
        df_h = df_h.dropna()
        
        feature_cols = [col for col in df_h.columns if col not in [target_col, 'target']]
        X = df_h[feature_cols]
        y = df_h['target']
        
        # Train model
        model = RandomForestRegressor(n_estimators=100, random_state=42)
        model.fit(X[:-h], y[:-h])  # Leave last h for testing
        
        models[h] = model
    
    return models

Evaluation Metrics for Time Series

def evaluate_forecast(actual, predicted, name="Model"):
    """Calculate common time series metrics."""
    mae = mean_absolute_error(actual, predicted)
    mse = mean_squared_error(actual, predicted)
    rmse = np.sqrt(mse)
    mape = np.mean(np.abs((actual - predicted) / actual)) * 100
    
    print(f"\n{name} Metrics:")
    print(f"  MAE:  {mae:.2f}")
    print(f"  RMSE: {rmse:.2f}")
    print(f"  MAPE: {mape:.1f}%")
    
    return {'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

# Compare models
evaluate_forecast(test, forecast, "ARIMA")

Common Pitfalls

1. Data Leakage (The Silent Killer)

In time series, leakage is especially dangerous because it is easy to accidentally use future information. Even computing the mean of the entire dataset “peeks” at the future.
# WRONG: Using future data for scaling
from sklearn.preprocessing import StandardScaler

# Don't do this! The scaler learns mean/std from ALL data, including the future
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # Uses all data including test!

# CORRECT: Fit only on training (past) data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Learn stats from past only
X_test_scaled = scaler.transform(X_test)  # Apply past stats to future -- no leakage

2. Ignoring Multiple Seasonalities

Real data often has layered seasonal patterns. An e-commerce site might have daily peaks (lunch hour), weekly patterns (weekend dips), and yearly cycles (Black Friday). Missing any layer means your model has a blind spot. Always check for:
  • Hourly (within a day: lunch rush, evening spike)
  • Daily/Weekly (7 days: weekend vs weekday)
  • Monthly (~30 days: paycheck cycles)
  • Yearly (365 days: holidays, weather-driven demand)
Practical tip: Plot autocorrelation at lags 1, 7, 30, and 365 to spot these patterns. Prophet and SARIMA can handle multiple seasonalities natively; vanilla ARIMA cannot.
A model trained during a bull market will assume stocks always go up. Always validate on multiple time windows that include different regimes (growth, decline, stability).
# Test on multiple time periods, not just the most recent
# Use expanding or sliding window backtests:
# Window 1: Train on Jan-Jun, test on Jul
# Window 2: Train on Jan-Sep, test on Oct
# Window 3: Train on Jan-Dec, test on next Jan
# This reveals whether your model is robust across different market conditions

Key Takeaways

Order Matters

Time series data is sequential - never shuffle!

Check Stationarity

Most methods require stationary data

Multiple Components

Decompose into trend, seasonality, and noise

Proper Validation

Use TimeSeriesSplit, not random CV

What’s Next?

Let’s explore the theory behind model performance with bias-variance tradeoff!

Continue to Bias-Variance Tradeoff

Understanding the fundamental tradeoff in machine learning

Interview Deep-Dive

This is one of the most common failure patterns in time series ML, and the root causes are almost always about the gap between offline evaluation and real-world conditions.
  • Look-ahead bias in feature engineering. The single most common cause. During backtesting, your rolling features (rolling_mean_7, rolling_std_30) may accidentally include the current timestamp. In production, that data point is not yet available. Even a subtle one-row offset in your lag features can cause this. I would audit every feature by asking: “At prediction time T, does this feature use any data from time T or later?”
  • Non-stationary data since training. Your model was trained on 2022-2023 data but the world changed. A pandemic, a product launch, a competitor entering the market — any regime change invalidates the patterns your model learned. Check whether the statistical properties (mean, variance, autocorrelation structure) of recent data match the training period.
  • Backtest used expanding window but production uses fixed window. If your backtest retrained the model at each step with all available history, but production uses a model trained once and frozen, the production model will degrade as it gets further from its training date.
  • Seasonality you did not model. Your model handles weekly seasonality but not monthly or yearly patterns. Backtesting over a short period might not reveal missing yearly seasonality. Run autocorrelation analysis at lags 1, 7, 30, 90, and 365 on your residuals to spot unmodeled patterns.
  • External factors not captured. The model does not know about holidays, promotions, or weather events that affect the target variable in production but may have been “averaged out” during backtesting across multiple years.
Follow-up: How would you set up backtesting to better predict production performance?I would use walk-forward validation with a gap. Instead of TimeSeriesSplit which tests immediately after training, I would add a gap between the training and test periods — say 7 days. This simulates the real-world delay between training a model and deploying it. I would also run backtests across multiple time windows that include different market conditions (growth, decline, flat periods) and report the worst-case performance, not just the average. If the model fails badly in any regime, it will eventually encounter that regime in production.
This is a question about understanding the strengths and failure modes of fundamentally different modeling philosophies.
  • ARIMA excels when the data is univariate and well-behaved. If you have a single time series with clear trend and seasonality, ARIMA (or SARIMA) is hard to beat. It has a strong theoretical foundation, provides confidence intervals natively, and the model is interpretable — you can explain “the forecast is based on the last 3 values and recent forecast errors with weekly seasonal adjustment.”
  • ML approaches win when you have rich exogenous features. If you have 20+ features beyond just the historical values — weather data, promotional calendars, economic indicators — gradient boosting can leverage all of them simultaneously. ARIMA cannot easily incorporate many external regressors (SARIMAX can, but it becomes unwieldy).
  • ARIMA handles concept drift poorly. ARIMA parameters are fit once on historical data. If the data generating process changes, ARIMA will keep forecasting based on stale patterns. ML models, especially with sliding-window retraining, adapt faster because they can learn from recent feature-target relationships.
  • ML requires careful temporal feature engineering. You need to manually create lag features, rolling statistics, and date components. ARIMA handles temporal dependence natively through its autoregressive and moving average terms.
  • At scale, ML is more practical. If you need to forecast 10,000 SKUs, training individual ARIMA models for each is expensive and brittle. A single gradient boosting model trained across all SKUs with SKU-level features can generalize and is operationally simpler.
My default in practice: start with ARIMA or Prophet as a baseline. If the problem has rich features and many series, switch to gradient boosting. For production systems, I often ensemble both — ARIMA captures the core temporal structure while the ML model captures feature interactions.Follow-up: How would you ensemble a statistical model like ARIMA with a gradient boosting model?The simplest approach is stacking. Use the ARIMA forecast as a feature input to the gradient boosting model alongside the other features. The GB model learns when to trust the ARIMA forecast and when to override it. Alternatively, use the ARIMA model to decompose the series into trend, seasonality, and residuals, then train the ML model only on the residuals. This lets each model do what it does best — ARIMA handles the deterministic temporal structure, and the ML model captures the complex nonlinear patterns in what is left over.
Multiple seasonalities are the norm in production, not the exception. An e-commerce platform might have hourly patterns (lunchtime browsing), daily patterns (weekday vs weekend), monthly patterns (payday effects), and yearly patterns (Black Friday, Christmas).
  • Detection: autocorrelation analysis at multiple lags. Plot ACF at lags 1, 24 (hourly within day), 168 (weekly), 720 (monthly), and 8760 (yearly). Significant spikes at these lags confirm the corresponding seasonality. Also use spectral analysis (FFT) to identify dominant frequencies — peaks in the power spectrum correspond to seasonal periods.
  • Prophet handles this natively. Prophet decomposes the series into additive components and can model yearly, weekly, and daily seasonality simultaneously. You can also add custom seasonalities (e.g., quarterly for financial data). This is my go-to for rapid prototyping because it requires zero manual decomposition.
  • SARIMA is limited. Standard SARIMA handles one seasonal period. For multiple seasonalities, you would need to difference at multiple periods or use TBATS (Trigonometric, Box-Cox, ARMA, Trend, Seasonality), which was specifically designed for this.
  • For ML approaches: encode each seasonality as features. Create sin/cos pairs for each seasonal period: sin(2pihour/24), cos(2pihour/24), sin(2piday_of_week/7), cos(2piday_of_week/7), etc. The sin/cos encoding ensures that “23:00 and 00:00 are close” rather than treating them as maximally different integers.
  • Fourier features are the production-grade approach. Generate K Fourier pairs for each seasonal period where K controls the smoothness. Lower K means smoother seasonality curves, higher K captures sharper patterns. You tune K via cross-validation.
Follow-up: How would you handle a seasonal pattern that changes intensity over time — for example, a holiday effect that is growing year over year?This is called a multiplicative seasonal effect, and additive models will underestimate it as it grows and overestimate it when it shrinks. I would use a multiplicative decomposition (set model=‘multiplicative’ in seasonal_decompose or Prophet). If the growth in seasonality is tied to a specific driver like revenue growth, I would include that driver as an interaction feature: holiday_flag multiplied by revenue_trend. This lets the model learn that “Black Friday in 2025 has a bigger spike than Black Friday in 2020 because the business is 3x larger.”
A 5% MAPE sounds good on paper, but aggregate metrics can hide critical failures. The business team is usually right about forecast quality, even if they cannot articulate why in statistical terms.
  • Check error distribution across segments. A 5% overall MAPE might be hiding 2% MAPE on high-volume products and 40% MAPE on the long tail. If the business decisions depend on the long-tail products (e.g., inventory for niche items), the model is failing where it matters most.
  • Check error at the decision-relevant horizon. If the business needs 14-day forecasts for purchasing decisions, but your 5% MAPE is measured at 1-day horizon, the metric is misleading. Multi-step forecasts degrade significantly, and the relevant accuracy is at the horizon where decisions are made.
  • Check directional accuracy. The business might care more about “did the forecast correctly predict an increase or decrease” than the exact magnitude. A forecast of +5% when the actual was +12% might be fine (right direction). A forecast of +5% when the actual was -3% is a disaster (wrong direction), even though the absolute error is similar. Report directional accuracy alongside MAPE.
  • Check error at critical thresholds. If the business has a reorder threshold at 100 units, a forecast of 102 (actual 98) causes a stockout. A forecast of 150 (actual 145) has a larger absolute error but zero business impact. Translate your model errors into business outcomes: stockouts, overstock, wasted resources.
  • Bias check. MAPE does not reveal systematic over- or under-forecasting. If the model consistently forecasts 5% low, the business is consistently understaffed or understocked. Mean Error (not Mean Absolute Error) reveals this directional bias.
Follow-up: How would you redesign the evaluation framework so that ML metrics align with business outcomes?I would work with the business team to define a cost function that maps prediction errors to dollars. For example: underforecasting by X units costs Yperunitinlostsales,overforecastingbyXunitscostsY per unit in lost sales, overforecasting by X units costs Z per unit in waste. Then I would optimize the model directly for this asymmetric cost function rather than symmetric MAPE. This aligns the model’s objective with the business objective. I would also create a dashboard that shows both the statistical metrics and the business metrics side by side, so both teams speak the same language.