Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Model Explainability
The Black Box Problem
Your model predicts a loan should be denied. The customer asks: “Why?” You say: “The neural network decided.” That is not acceptable — not legally, not ethically, and not practically. In the EU, GDPR’s “right to explanation” means customers can legally demand to know why an automated system made a decision about them. In the US, the Equal Credit Opportunity Act requires lenders to provide specific reasons for credit denials. Beyond regulation, if your doctor cannot explain why an AI recommends surgery, no patient should trust that recommendation. Explainability is not a nice-to-have. It is a requirement for deploying ML in any domain where decisions affect people’s lives, money, or freedom.Why Explainability Matters
| Domain | Why It’s Required |
|---|---|
| Healthcare | Doctors need to validate AI recommendations |
| Finance | Regulations require explainable credit decisions |
| Legal | Right to explanation in GDPR |
| Hiring | Avoid discrimination and bias |
| Insurance | Justify pricing decisions |
Types of Explainability
Global Explainability
How does the model work overall?
What features matter most in general?
Local Explainability
Why did the model make THIS specific prediction?
What drove this particular decision?
Method 1: Feature Importance
For Tree-Based Models
For Linear Models
Method 2: Permutation Importance
Tree-based feature importance (Method 1) only works for tree models and can be biased toward high-cardinality features. Permutation importance is model-agnostic — it works for any model by asking a simple question: “If I scramble this feature’s values, how much does the model’s performance drop?” Bigger drop means the model relied more on that feature.How permutation importance works:
- Baseline: Get model accuracy on test set
- Shuffle one feature’s values randomly
- Measure accuracy drop
- Bigger drop = More important feature
Method 3: SHAP Values
SHAP (SHapley Additive exPlanations) is the current gold standard for model explainability, and for good reason. It is rooted in game theory — specifically Shapley values, a concept from cooperative game theory that fairly distributes a “payout” (the prediction) among “players” (the features). Each feature gets credit proportional to its actual contribution, accounting for interactions with other features. The key advantage: SHAP values are the only explanation method that satisfies three mathematically desirable properties — local accuracy (explanations sum to the prediction), missingness (missing features contribute nothing), and consistency (if a feature becomes more important, its SHAP value never decreases).Global: Summary Plot
Global: Bar Plot
Local: Individual Prediction Explanation
Local: Waterfall Plot
Method 4: LIME (Local Explanations)
LIME takes a different approach from SHAP: instead of using game theory, it explains a prediction by fitting a simple, interpretable model (like linear regression) in the neighborhood of the prediction. Think of it as zooming in on one prediction and saying, “Locally, the model behaves like this simple rule.” This makes LIME fast and intuitive, but less theoretically grounded than SHAP.LIME vs SHAP
| Aspect | LIME | SHAP |
|---|---|---|
| Method | Local linear approximation | Game theory (Shapley values) |
| Consistency | Can vary between runs (depends on random perturbations) | Mathematically consistent and deterministic |
| Speed | Fast for single predictions | Slower for many samples (but TreeSHAP is fast) |
| Global | No (local only — one prediction at a time) | Yes (aggregate local explanations into global view) |
| Accuracy | Approximate (good enough for most uses) | Exact for tree models via TreeSHAP |
| When to use | Quick debugging, prototyping, non-tree models | Production systems, regulatory compliance, thorough analysis |
Method 5: Partial Dependence Plots
While SHAP and LIME explain individual predictions, Partial Dependence Plots (PDPs) answer a different question: “How does this feature affect predictions on average, across the entire dataset?” They show the marginal effect of a feature — what happens to the average prediction as you sweep one feature from low to high values while holding everything else constant. This is invaluable for understanding the “shape” of the learned relationship (linear? threshold? U-shaped?).2D Interaction Plot
Method 6: ICE Plots
Individual Conditional Expectation - like PDP but for each sample:Practical: Explaining a Loan Decision
Building an Explanation Report
Key Takeaways
Multiple Methods
Use feature importance, SHAP, LIME, and PDPs together
Global vs Local
Global shows patterns, local explains decisions
SHAP is Gold Standard
Mathematically grounded, works for any model
Document Explanations
Generate reports for stakeholders
What’s Next?
Now that you can explain your models, let’s learn how to build robust ML pipelines!Continue to ML Pipelines
Build reproducible, production-ready ML workflows
Interview Deep-Dive
A regulator asks you to prove that your loan approval model does not discriminate based on race or gender. How would you use explainability tools to demonstrate this?
A regulator asks you to prove that your loan approval model does not discriminate based on race or gender. How would you use explainability tools to demonstrate this?
This is a high-stakes scenario where explainability is not optional — it is a legal requirement. The approach must be rigorous enough to withstand regulatory scrutiny.
- Step 1: Check if protected attributes are in the model directly. Obvious, but necessary. If race or gender is a direct input feature, remove it. However, this is insufficient because of proxy variables.
- Step 2: Identify proxy features using SHAP. Compute SHAP values for every feature and check if any non-protected feature is highly correlated with a protected attribute while also having high SHAP importance. For example, zip code can be a strong proxy for race due to residential segregation. If zip code has high SHAP importance and correlates with race at r > 0.5, you have a proxy discrimination risk.
- Step 3: Conditional SHAP analysis by protected group. For approved vs denied applicants, compute average SHAP values separately for each demographic group. If the average SHAP contribution of “income” is -0.3 for one racial group and -0.1 for another, the model is weighting income differently across groups — possibly due to structural correlations in the training data.
- Step 4: Partial dependence analysis for fairness. Plot PDPs for key features separately for each protected group. If the PDP for “credit score” shows the same relationship with approval probability regardless of race, the model treats credit score fairly. If the curves diverge, the model has learned different decision rules for different groups.
- Step 5: Counterfactual explanations. For each denied applicant, compute: “What would need to change for this person to be approved?” If the answer is systematically different for different demographic groups (Group A needs 15K more income for the same approval), the model is discriminatory.
- Document everything. Regulators want audit trails. Log the SHAP analysis, the proxy variable check, the group-level PDP comparison, and the counterfactual analysis. This documentation should be generated automatically with each model version.
SHAP and LIME give different explanations for the same prediction. Which one do you trust and why?
SHAP and LIME give different explanations for the same prediction. Which one do you trust and why?
This happens regularly in practice and understanding why they disagree is more valuable than blindly trusting either one.
- Why they disagree. LIME and SHAP use fundamentally different approaches. LIME perturbs the input randomly, fits a local linear model, and reports the linear model’s coefficients as feature contributions. SHAP computes exact game-theoretic attribution based on all possible feature coalitions. Because LIME’s explanation depends on the random perturbation neighborhood and the number of perturbation samples, it can vary between runs. SHAP (especially TreeSHAP for tree models) is deterministic and exact.
- Trust SHAP for consistency and theoretical guarantees. SHAP satisfies three desirable axioms: local accuracy (contributions sum to the prediction), missingness (features not in the model contribute zero), and consistency (if a feature becomes more important, its SHAP value never decreases). LIME satisfies none of these axioms, and its explanations can be inconsistent — the same model on the same input can get different LIME explanations depending on the random seed.
- Trust LIME for speed and simplicity. When you need a quick gut check during development and cannot wait for SHAP computation on a large dataset, LIME is fine. It is also easier to explain to non-technical stakeholders: “We zoomed in on this prediction and asked what a simple model would say locally.”
- When they disagree significantly. If the top-3 features from SHAP and LIME are completely different, the model likely has complex feature interactions that LIME’s local linear approximation cannot capture. In that case, SHAP is almost certainly more correct. If they mostly agree on the top features but disagree on the exact magnitudes, both are usable.
- In production: use SHAP. For any system where explanations matter (compliance, debugging, stakeholder trust), use SHAP. TreeSHAP is fast enough for batch processing, and the deterministic, theoretically grounded explanations are auditable. Use LIME only for ad-hoc exploration during development.
How would you use explainability tools to debug a model that performs well on average but fails catastrophically on a specific customer segment?
How would you use explainability tools to debug a model that performs well on average but fails catastrophically on a specific customer segment?
This is a production debugging scenario where explainability tools become diagnostic instruments rather than communication tools.
- Step 1: Identify the failing segment precisely. Compute per-segment metrics (accuracy, precision, recall) across all meaningful slices: age groups, geographic regions, customer types, product categories. Find the specific segment where performance degrades. Tools like Google’s What-If Tool or custom slicing scripts can automate this.
- Step 2: Compare SHAP distributions between segments. For the well-performing segment and the failing segment, plot the SHAP summary plots side by side. Look for features where the SHAP distribution is radically different. For example, if “account_age” has high positive SHAP values in the good segment and near-zero values in the failing segment, the model has not learned the account_age pattern for that segment — likely because it was underrepresented in training.
- Step 3: Examine the feature distributions. The failing segment may have feature values outside the training distribution. If training data had customers with tenure 1-60 months, but the failing segment has tenure 0-2 months (brand new customers), the model is extrapolating. PDP analysis will show that predictions for extreme feature values are unreliable.
- Step 4: Check for missing interactions. Use SHAP interaction values to see if the failing segment requires a feature interaction that the model did not learn. For example, “high income” might predict low churn in general, but “high income + fiber optic + no tech support” might predict high churn. If this combination is rare in training, the model misses it.
- Step 5: Fix the root cause. Options include: collecting more training data for the underrepresented segment, creating segment-specific models, engineering new features that capture the missing pattern, or retraining with stratified sampling to ensure the segment is adequately represented.