Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Anomaly Detection
Catching the Unusual
Your credit card company calls: “Did you just spend $5,000 at a jewelry store in another country?” That is anomaly detection in action — finding the needle in the haystack. But here is what makes it fundamentally different from classification: in classification, you have labeled examples of both classes to learn from. In anomaly detection, you often have thousands of “normal” examples and almost no “abnormal” ones. You are trying to define what “normal” looks like and flag anything that deviates. It is the difference between “I know what a cat looks like” versus “I know what everything-that-is-not-weird looks like.”Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: Clustering, Dimensionality Reduction
Tools: scikit-learn, PyOD
Difficulty: Intermediate
Prerequisites: Clustering, Dimensionality Reduction
Tools: scikit-learn, PyOD
Types of Anomalies
| Type | Description | Example |
|---|---|---|
| Point Anomaly | Single data point differs | One transaction of 100 |
| Contextual Anomaly | Anomalous in context | High AC usage in winter |
| Collective Anomaly | Group of related points | Network intrusion pattern |
Approach 1: Statistical Methods
Z-Score (Standard Deviation)
Simple but effective for univariate data: Points with are typically considered anomalies.IQR (Interquartile Range)
More robust to outliers:Approach 2: Isolation Forest
Key Insight: Anomalies are easier to isolate than normal points. Think of it this way: if you are playing 20 questions to identify a specific person in a crowd, it takes many questions to isolate a “normal” person (they blend in with others). But the person wearing a clown costume? One or two questions and you have them. Isolation Forest works the same way — it randomly picks features and random split points. Points that get isolated quickly (in few splits) are anomalies. Points that take many splits to isolate are normal.Approach 3: Local Outlier Factor (LOF)
Key Insight: An anomaly is not just a point that is far from everything — it is a point whose local density is much lower than its neighbors’ density. This distinction matters. In a dataset with a dense city cluster and a sparse rural cluster, a point between the two clusters might be far from most data but is not necessarily anomalous. LOF detects the point within the dense city cluster that is suspiciously isolated — the house with no neighbors in a neighborhood where every other house has ten.| Aspect | Isolation Forest | LOF |
|---|---|---|
| Speed | Fast | Slower (needs neighbors) |
| Local density | No | Yes |
| High dimensions | Good | May struggle |
| Best for | Global outliers | Local outliers in varied densities |
Approach 4: One-Class SVM
One-Class SVM is ideal for the common real-world scenario where you have plenty of normal data but few or no labeled anomalies. It learns a tight boundary around the normal data in feature space, then flags anything outside that boundary as anomalous. Think of it as drawing the smallest possible fence around your normal data — anything outside the fence is suspicious.Approach 5: Autoencoders
Autoencoders learn to compress data into a small representation and then reconstruct it. The key insight: if you train an autoencoder only on normal data, it learns to compress and reconstruct normal patterns well. When you feed it an anomaly, the reconstruction will be poor (high error), because the autoencoder has never learned to represent that kind of pattern. The reconstruction error becomes your anomaly score. This is like training a photocopier to perfectly reproduce cats. When you feed it a picture of a dog, the copy will look weird — the machine does not know how to represent “dog.”Real-World Application: Credit Card Fraud
Comparing Methods
Summary
| Method | Best For | Pros | Cons |
|---|---|---|---|
| Z-Score/IQR | Univariate, known distributions | Simple, interpretable | Assumes distribution |
| Isolation Forest | High-dimensional, fast | Scalable, no assumptions | Contamination param critical |
| LOF | Local density variations | Captures local patterns | Slow on large data |
| One-Class SVM | Training on normal only | Strong boundary | Slow, sensitive to kernel |
| Autoencoder | Complex patterns, deep learning | Learns complex patterns | Needs lots of data |
Best Practice: Start with Isolation Forest (fast, general-purpose), then try specialized methods if needed. Always validate with domain experts—what looks anomalous statistically may be normal business behavior!
Interview Deep-Dive
You are designing an anomaly detection system for a manufacturing plant that produces 1 million units per day. How would you approach this end-to-end?
You are designing an anomaly detection system for a manufacturing plant that produces 1 million units per day. How would you approach this end-to-end?
Manufacturing anomaly detection is one of the most production-critical ML applications because missed defects mean customer returns, safety incidents, and brand damage. The approach needs to be robust, fast, and operationally practical.
- Start with domain understanding. Talk to the quality engineers. What types of defects exist? How are they currently caught? What is the current defect rate? What sensors and measurements are available? The answers determine whether this is a supervised problem (if you have labeled defects) or unsupervised (if defects are undefined or too rare to have examples of).
- For an unsupervised approach (most common in manufacturing). Train on “known good” production data from a stable period. Use Isolation Forest as the primary detector — it is fast, handles high-dimensional sensor data well, and the contamination parameter can be tuned based on the historical defect rate. One-Class SVM is a strong alternative if the boundary between normal and abnormal is well-defined.
- Feature engineering is critical. Raw sensor readings are often noisy. Engineer features like rolling averages, rate of change, deviation from process target, and cross-sensor correlations. A single sensor reading of 105 degrees might be normal, but 105 degrees combined with a falling pressure trend might indicate a problem.
- Set the contamination parameter conservatively. In manufacturing, false negatives (missed defects reaching customers) are far more costly than false positives (stopping the line for inspection). I would set contamination higher than the expected defect rate to prioritize recall, then tune the threshold based on the operational cost of false alarms versus missed defects.
- Real-time scoring with batch retraining. Score each unit in real-time as it comes off the line (inference must be under 100ms to not slow production). Retrain the model weekly on the latest “confirmed good” data to adapt to process drift (new raw materials, tool wear, seasonal temperature changes).
- Feedback loop. When the model flags a unit, the quality team inspects it. Log whether the flag was correct. Use this feedback to track precision over time and retrigger retraining if precision drops below the team’s tolerance.
Compare Isolation Forest and Autoencoder-based anomaly detection. When would you choose one over the other?
Compare Isolation Forest and Autoencoder-based anomaly detection. When would you choose one over the other?
This question tests whether you understand the fundamental trade-offs between classical ML and deep learning approaches for anomaly detection.
- Isolation Forest: fast, interpretable, works with moderate dimensions. Isolation Forest requires no training in the deep learning sense — it builds random trees that isolate points. Anomalies are isolated quickly (few splits), normal points take many splits. It works well on tabular data with 5-100 features, trains in seconds, and the anomaly score is interpretable (shorter path length = more anomalous). The limitation: it assumes anomalies are globally different from normal data. If anomalies are contextual (normal in one context, anomalous in another), Isolation Forest may miss them.
- Autoencoders: powerful for complex patterns, requires more data and tuning. Autoencoders learn a compressed representation of normal data. Anomalies have high reconstruction error because the autoencoder never learned to represent them. This works exceptionally well for image-based defect detection, time series with complex temporal patterns, and any data where the “normal” manifold is nonlinear and high-dimensional.
- Choose Isolation Forest when: tabular data, moderate dimensionality (under 100 features), limited training data (hundreds to thousands of samples), explainability is important, fast iteration is needed, computational resources are limited.
- Choose Autoencoders when: image or video data, very high-dimensional data (thousands of features), complex nonlinear patterns in normal data, large training dataset available (tens of thousands of normal samples), you need the latent representation for downstream tasks.
- A practical middle ground: use both. Train an Isolation Forest on engineered features for fast, interpretable scoring. Train an autoencoder on raw data for catching complex patterns the feature engineering missed. Flag an item as anomalous if either detector flags it. This ensemble approach maximizes recall at the cost of slightly more false positives.
How would you handle concept drift in a production anomaly detection system -- where what counts as 'normal' changes over time?
How would you handle concept drift in a production anomaly detection system -- where what counts as 'normal' changes over time?
Concept drift is the fundamental challenge of production anomaly detection. Unlike supervised learning where you can retrain on new labels, anomaly detection often lacks timely labels, making drift detection harder.
- Distinguish process drift from concept drift. Process drift means the normal data distribution shifts (e.g., seasonal changes in user behavior). This is expected and the model should adapt. Concept drift means the definition of “anomalous” changes (e.g., a new type of fraud emerges). The model needs to learn about the new anomaly type, which usually requires new labeled examples.
- For process drift: sliding window retraining. Retrain the anomaly detector on the most recent N days of data, treating all recent data as “normal” (with optional confirmation from the ops team). This ensures the model’s baseline adapts to gradual changes. The window size is a trade-off: too short and you lose stability, too long and you adapt too slowly.
- For concept drift: monitor the anomaly rate. If your detector’s flag rate suddenly drops from 2% to 0.1%, it might mean fewer anomalies (good), or it might mean a new type of anomaly emerged that the model does not recognize (bad). Compare the flag rate to a ground truth sample. If the ground truth anomaly rate has not changed but the flag rate dropped, the model is missing something new.
- Adaptive thresholds. Instead of a fixed anomaly score threshold, use a percentile-based threshold: flag the top 2% of anomaly scores regardless of the absolute score values. This automatically adapts to shifts in the score distribution caused by drift. The trade-off is that you always flag exactly 2%, which means your false positive rate is constant but your precision varies.
- Ensemble across time windows. Train multiple detectors on different time windows (last 7 days, last 30 days, last 90 days). If the 7-day detector flags something but the 90-day detector does not, it is anomalous relative to recent behavior but not historically — this is a drift signal. If both flag it, it is a true anomaly regardless of drift.