Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Time Series Forecasting
Time Changes Everything
Most ML algorithms assume observations are independent — like pulling marbles from a bag, where each draw tells you nothing about the next. But what if tomorrow’s stock price depends on today’s? What if next month’s sales follow a seasonal pattern? Think of it this way: regular ML is like analyzing a bag of photos — the order does not matter. Time series is like analyzing a movie — shuffle the frames and you lose all meaning. Welcome to Time Series — where order matters.The Restaurant Sales Problem
You manage a restaurant chain. You need to predict:- How many customers next week?
- How much inventory to order?
- How many staff to schedule?
Components of Time Series
Every time series can be decomposed into components:The Four Components
Think of a time series like listening to music. The trend is the overall volume getting louder or quieter across the song. Seasonality is the repeating beat pattern. Cyclical patterns are the chorus coming back, but not at perfectly fixed intervals. And noise is the static in the recording.| Component | Description | Example |
|---|---|---|
| Trend | Long-term direction | Growing customer base |
| Seasonality | Regular, fixed-period patterns | Summer ice cream sales, Monday dips in retail |
| Cyclical | Irregular long patterns (no fixed period) | Economic cycles, housing market booms |
| Noise | Random variation you cannot predict | Weather effects, one-off events |
Stationarity: The Foundation
Stationarity is like a river that flows at a constant average speed and depth. Some days it is a bit faster, some days slower, but the long-run behavior stays the same. A non-stationary series is like a river during flood season — the average keeps rising, so yesterday’s “normal” level is not useful for predicting tomorrow’s.Testing for Stationarity
Making Data Stationary
Simple Forecasting Methods
1. Moving Average
2. Exponential Smoothing
ARIMA: The Classic Approach
ARIMA = AutoRegressive Integrated Moving Average Think of ARIMA as three tools combined into one Swiss Army knife:- AR(p) — AutoRegressive: “The future looks like the recent past.” Uses p past values as predictors. If p=3, the model says: “To predict tomorrow, look at the last 3 days.”
- I(d) — Integrated: “Difference the data d times to make it stationary.” If d=1, you work with day-over-day changes rather than raw values.
- MA(q) — Moving Average: “Learn from recent forecast mistakes.” Uses q past forecast errors. If q=2, the model adjusts based on how wrong it was yesterday and the day before.
Auto ARIMA
SARIMA: Adding Seasonality
SARIMA = Seasonal ARIMA with additional seasonal components:Prophet: Modern Forecasting
Facebook’s Prophet handles trends, seasonality, and holidays automatically:Adding Holidays
ML Approach: Feature Engineering for Time Series
Transform time series into supervised learning:Cross-Validation for Time Series
Multi-Step Forecasting
Strategy 1: Recursive (Autoregressive)
Predict one step, feed that prediction back in as input to predict the next step. This is like a weather forecaster who uses today’s forecast to make tomorrow’s — errors compound over time, so accuracy degrades at longer horizons.Strategy 2: Direct
Train separate models for each horizon. This avoids the error-compounding problem of recursive forecasting — each model is independently optimized for its specific target distance. The tradeoff: you need to train and maintain multiple models.Evaluation Metrics for Time Series
Common Pitfalls
1. Data Leakage (The Silent Killer)
In time series, leakage is especially dangerous because it is easy to accidentally use future information. Even computing the mean of the entire dataset “peeks” at the future.2. Ignoring Multiple Seasonalities
Real data often has layered seasonal patterns. An e-commerce site might have daily peaks (lunch hour), weekly patterns (weekend dips), and yearly cycles (Black Friday). Missing any layer means your model has a blind spot. Always check for:- Hourly (within a day: lunch rush, evening spike)
- Daily/Weekly (7 days: weekend vs weekday)
- Monthly (~30 days: paycheck cycles)
- Yearly (365 days: holidays, weather-driven demand)
3. Overfitting to Recent Trends
A model trained during a bull market will assume stocks always go up. Always validate on multiple time windows that include different regimes (growth, decline, stability).Key Takeaways
Order Matters
Time series data is sequential - never shuffle!
Check Stationarity
Most methods require stationary data
Multiple Components
Decompose into trend, seasonality, and noise
Proper Validation
Use TimeSeriesSplit, not random CV
What’s Next?
Let’s explore the theory behind model performance with bias-variance tradeoff!Continue to Bias-Variance Tradeoff
Understanding the fundamental tradeoff in machine learning
Interview Deep-Dive
You built a time series model that performs well on backtesting but fails in production. What are the most common causes and how would you debug this?
You built a time series model that performs well on backtesting but fails in production. What are the most common causes and how would you debug this?
This is one of the most common failure patterns in time series ML, and the root causes are almost always about the gap between offline evaluation and real-world conditions.
- Look-ahead bias in feature engineering. The single most common cause. During backtesting, your rolling features (rolling_mean_7, rolling_std_30) may accidentally include the current timestamp. In production, that data point is not yet available. Even a subtle one-row offset in your lag features can cause this. I would audit every feature by asking: “At prediction time T, does this feature use any data from time T or later?”
- Non-stationary data since training. Your model was trained on 2022-2023 data but the world changed. A pandemic, a product launch, a competitor entering the market — any regime change invalidates the patterns your model learned. Check whether the statistical properties (mean, variance, autocorrelation structure) of recent data match the training period.
- Backtest used expanding window but production uses fixed window. If your backtest retrained the model at each step with all available history, but production uses a model trained once and frozen, the production model will degrade as it gets further from its training date.
- Seasonality you did not model. Your model handles weekly seasonality but not monthly or yearly patterns. Backtesting over a short period might not reveal missing yearly seasonality. Run autocorrelation analysis at lags 1, 7, 30, 90, and 365 on your residuals to spot unmodeled patterns.
- External factors not captured. The model does not know about holidays, promotions, or weather events that affect the target variable in production but may have been “averaged out” during backtesting across multiple years.
When would you use ARIMA versus an ML approach like gradient boosting for time series forecasting? What are the trade-offs?
When would you use ARIMA versus an ML approach like gradient boosting for time series forecasting? What are the trade-offs?
This is a question about understanding the strengths and failure modes of fundamentally different modeling philosophies.
- ARIMA excels when the data is univariate and well-behaved. If you have a single time series with clear trend and seasonality, ARIMA (or SARIMA) is hard to beat. It has a strong theoretical foundation, provides confidence intervals natively, and the model is interpretable — you can explain “the forecast is based on the last 3 values and recent forecast errors with weekly seasonal adjustment.”
- ML approaches win when you have rich exogenous features. If you have 20+ features beyond just the historical values — weather data, promotional calendars, economic indicators — gradient boosting can leverage all of them simultaneously. ARIMA cannot easily incorporate many external regressors (SARIMAX can, but it becomes unwieldy).
- ARIMA handles concept drift poorly. ARIMA parameters are fit once on historical data. If the data generating process changes, ARIMA will keep forecasting based on stale patterns. ML models, especially with sliding-window retraining, adapt faster because they can learn from recent feature-target relationships.
- ML requires careful temporal feature engineering. You need to manually create lag features, rolling statistics, and date components. ARIMA handles temporal dependence natively through its autoregressive and moving average terms.
- At scale, ML is more practical. If you need to forecast 10,000 SKUs, training individual ARIMA models for each is expensive and brittle. A single gradient boosting model trained across all SKUs with SKU-level features can generalize and is operationally simpler.
Explain how you would detect and handle multiple seasonalities in a production forecasting system.
Explain how you would detect and handle multiple seasonalities in a production forecasting system.
Multiple seasonalities are the norm in production, not the exception. An e-commerce platform might have hourly patterns (lunchtime browsing), daily patterns (weekday vs weekend), monthly patterns (payday effects), and yearly patterns (Black Friday, Christmas).
- Detection: autocorrelation analysis at multiple lags. Plot ACF at lags 1, 24 (hourly within day), 168 (weekly), 720 (monthly), and 8760 (yearly). Significant spikes at these lags confirm the corresponding seasonality. Also use spectral analysis (FFT) to identify dominant frequencies — peaks in the power spectrum correspond to seasonal periods.
- Prophet handles this natively. Prophet decomposes the series into additive components and can model yearly, weekly, and daily seasonality simultaneously. You can also add custom seasonalities (e.g., quarterly for financial data). This is my go-to for rapid prototyping because it requires zero manual decomposition.
- SARIMA is limited. Standard SARIMA handles one seasonal period. For multiple seasonalities, you would need to difference at multiple periods or use TBATS (Trigonometric, Box-Cox, ARMA, Trend, Seasonality), which was specifically designed for this.
- For ML approaches: encode each seasonality as features. Create sin/cos pairs for each seasonal period: sin(2pihour/24), cos(2pihour/24), sin(2piday_of_week/7), cos(2piday_of_week/7), etc. The sin/cos encoding ensures that “23:00 and 00:00 are close” rather than treating them as maximally different integers.
- Fourier features are the production-grade approach. Generate K Fourier pairs for each seasonal period where K controls the smoothness. Lower K means smoother seasonality curves, higher K captures sharper patterns. You tune K via cross-validation.
Your time series model's error is 5% MAPE but the business team says the forecasts are terrible. How do you reconcile this?
Your time series model's error is 5% MAPE but the business team says the forecasts are terrible. How do you reconcile this?
A 5% MAPE sounds good on paper, but aggregate metrics can hide critical failures. The business team is usually right about forecast quality, even if they cannot articulate why in statistical terms.
- Check error distribution across segments. A 5% overall MAPE might be hiding 2% MAPE on high-volume products and 40% MAPE on the long tail. If the business decisions depend on the long-tail products (e.g., inventory for niche items), the model is failing where it matters most.
- Check error at the decision-relevant horizon. If the business needs 14-day forecasts for purchasing decisions, but your 5% MAPE is measured at 1-day horizon, the metric is misleading. Multi-step forecasts degrade significantly, and the relevant accuracy is at the horizon where decisions are made.
- Check directional accuracy. The business might care more about “did the forecast correctly predict an increase or decrease” than the exact magnitude. A forecast of +5% when the actual was +12% might be fine (right direction). A forecast of +5% when the actual was -3% is a disaster (wrong direction), even though the absolute error is similar. Report directional accuracy alongside MAPE.
- Check error at critical thresholds. If the business has a reorder threshold at 100 units, a forecast of 102 (actual 98) causes a stockout. A forecast of 150 (actual 145) has a larger absolute error but zero business impact. Translate your model errors into business outcomes: stockouts, overstock, wasted resources.
- Bias check. MAPE does not reveal systematic over- or under-forecasting. If the model consistently forecasts 5% low, the business is consistently understaffed or understocked. Mean Error (not Mean Absolute Error) reveals this directional bias.