Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Capstone Project: Complete ML System
Project Overview
You’ll build a Customer Churn Prediction System - predicting which customers will leave a subscription service. This project synthesizes everything you’ve learned:- Data exploration and cleaning
- Feature engineering
- Model selection and evaluation
- Deployment considerations
Part 1: Problem Definition
Business Context
A telecom company loses 15-20% of customers monthly. Each lost customer costs:- Revenue loss: $500-2000/year
- Acquisition cost for replacement: $300-500
- And the hidden cost: every churned customer who complains publicly damages future acquisition
Success Metrics
Choosing the right metric is a business decision, not a technical one. Here, missing a churner (false negative) costs 50 in labor. That asymmetry drives our metric priorities:| Metric | Target | Why |
|---|---|---|
| Recall | > 70% | Catch most churners — each missed churner costs 10x more than a wasted call |
| Precision | > 50% | Keep the retention team productive — too many false positives erodes their trust in the model |
| AUC-ROC | > 0.80 | Overall discriminative power across all thresholds |
Part 2: Data Exploration
Exploratory Analysis
Part 3: Feature Engineering
Prepare for Modeling
Part 4: Model Development
Baseline Models
Hyperparameter Tuning
Part 5: Model Evaluation
Detailed Metrics
Feature Importance
Part 6: Business Impact Analysis
Part 7: Production Considerations
Model Serialization
A common mistake is saving only the model and forgetting the preprocessing artifacts. In production, you need everything required to go from raw customer data to a prediction — the scaler, the feature names, the threshold, and ideally the model version and training date.Inference Pipeline
Monitoring Dashboard
Project Checklist
🏆 Congratulations!
You've Completed the Capstone!
You’ve built a complete, production-ready ML system from scratch. You now have:
- Technical Skills: Data exploration, feature engineering, model training, evaluation, and deployment
- Business Acumen: Translating ML metrics to business impact
- Production Mindset: Monitoring, maintenance, and continuous improvement
📝 Portfolio Documentation Template
How to Present This Project to Employers
How to Present This Project to Employers
Project Summary (for your portfolio/resume)
Title: Customer Churn Prediction SystemBusiness Impact:- Identifies 70%+ of at-risk customers 2 weeks before churn
- Enables targeted retention campaigns
- Estimated $X00K annual savings in customer lifetime value
- End-to-end ML pipeline from raw data to production API
- Comparison of 5+ algorithms (Logistic Regression, Random Forest, XGBoost, etc.)
- Feature engineering creating 20+ derived features
- Threshold optimization for business-aligned precision-recall tradeoff
- Monitoring and alerting system for model drift
- Python, scikit-learn, XGBoost, pandas, numpy
- FastAPI for model serving
- MLflow for experiment tracking
- Docker for containerization
GitHub README Structure
Interview Talking Points
-
“Walk me through this project”
- Start with business problem (churn costs $X)
- Explain data exploration findings
- Discuss feature engineering decisions
- Compare model approaches
- Show business impact calculation
-
“What was the biggest challenge?”
- Class imbalance (70/30 split)
- Feature engineering from raw transaction data
- Choosing the right threshold for business needs
-
“How would you improve it?”
- Real-time predictions with streaming
- A/B testing different interventions
- Incorporating more data sources
- Automated retraining pipeline
🔗 Complete ML Mastery Checklist
Skills You’ve Mastered Across This Course:
You’re now ready for:
| Category | Skills | Modules |
|---|---|---|
| Fundamentals | Linear models, loss functions, gradient descent | 1-3 |
| Classification | Logistic regression, metrics, thresholds | 4-4b |
| Algorithms | Trees, ensembles, SVM, Naive Bayes | 5-6 |
| Evaluation | Cross-validation, precision/recall, ROC | 7 |
| Data Skills | Feature engineering, handling messy data | 8 |
| Optimization | Hyperparameter tuning | 9 |
| End-to-End | Complete pipelines | 10, 19 |
| Unsupervised | Clustering, dimensionality reduction | 11, 18 |
| Deep Learning | Neural networks basics | 12 |
| Production | Regularization, deployment, monitoring | 13, 14 |
| Time Series | Forecasting techniques | 15 |
| Theory | Bias-variance, data leakage | 16-17 |
| Real-World | Imbalanced data, explainability | 20-23 |
- ML Engineer roles (junior to mid-level)
- Data Scientist positions
- AI/ML-focused software engineering
- Further study in deep learning, NLP, or computer vision
What’s Next?
You’ve completed the capstone, but there’s more to learn! Let’s tackle real-world challenges.Continue Learning
Handle datasets where 99% of data is one class
Deep Learning
Move on to neural networks, transformers, and LLMs
Interview Deep-Dive
Walk me through how you would monitor a churn prediction model after deployment. What metrics would you track and what would trigger a retrain?
Walk me through how you would monitor a churn prediction model after deployment. What metrics would you track and what would trigger a retrain?
Model monitoring is where most ML projects fail — the model gets deployed and nobody watches it. Here is the monitoring framework I would set up:
- Prediction distribution monitoring. Track the distribution of predicted churn probabilities daily. If the model suddenly starts predicting 80% of customers as high-risk (when historically it was 15%), something has changed — either the data or the model. I would use Population Stability Index (PSI) to compare the current prediction distribution against a reference period. A PSI above 0.2 triggers an investigation.
- Input feature drift detection. For each of the top 10 features by importance, monitor the mean, variance, and null rate on a daily cadence. A significant shift in any key feature (e.g., average tenure dropping because of a marketing campaign that acquired many new short-tenure customers) directly affects model performance. Alert when KS-test p-value drops below 0.01 for any feature.
- Delayed ground truth monitoring. Churn labels arrive with a delay (you know someone churned 30-60 days after the prediction). Once labels are available, compute rolling precision, recall, and AUC on a weekly window. Plot these over time and alert when any metric drops more than 5% from the baseline.
- Business outcome tracking. Track the retention team’s success rate on model-flagged customers. If the team is intervening on model-identified churners but the retention rate is not improving, either the model is flagging the wrong customers or the interventions are ineffective. This is the ultimate ground truth.
- Retrain triggers. I would retrain when any of these conditions are met: AUC drops below 0.75 (the business-agreed threshold), PSI on predictions exceeds 0.25, a major business event occurred (new pricing, new product, acquisition), or on a fixed quarterly cadence regardless of metrics. The quarterly cadence catches slow drift that no single alert catches.
In the churn project, you chose Gradient Boosting as your final model. The business team wants to know why a customer was flagged. How do you explain individual predictions in production?
In the churn project, you chose Gradient Boosting as your final model. The business team wants to know why a customer was flagged. How do you explain individual predictions in production?
This is where explainability meets production engineering. The business team does not care about SHAP theory — they need actionable explanations that a retention agent can use in a phone call.
- Use SHAP values for individual explanations. For each flagged customer, compute SHAP values to identify the top 3-5 factors driving the churn prediction. Translate these into business language: “This customer is high-risk primarily because they are on a month-to-month contract (contributing 0.15 to churn probability), have filed 5 support tickets in the last month (contributing 0.12), and have not activated any add-on services (contributing 0.08).”
- Pre-compute explanations in batch. Computing SHAP values at inference time adds latency. For a daily batch scoring job, compute SHAP values alongside predictions and store them. The retention team dashboard pulls pre-computed explanations, not real-time calculations.
- Template the explanations. Create human-readable templates: “This customer is [risk level] because of [top factor], [second factor], and [third factor]. Recommended action: [action based on top factor].” The action mapping is domain logic: if the top factor is “month-to-month contract,” recommend an annual plan discount. If it is “many support tickets,” recommend a dedicated support escalation.
- Calibrate the probability outputs. Gradient boosting probabilities are not always well-calibrated. A predicted 0.7 might not actually mean a 70% chance of churning. Use Platt scaling or isotonic regression to calibrate probabilities so the business team can trust the numbers. Calibrated probabilities enable statements like “of all customers we flag as 70%+ risk, historically 68-72% actually churn.”
How would you design an A/B test to measure the actual business impact of your churn prediction model?
How would you design an A/B test to measure the actual business impact of your churn prediction model?
This is where the rubber meets the road. A model’s AUC means nothing if deploying it does not actually reduce churn or increase revenue.
- Randomize at the customer level, not the prediction level. Randomly assign customers to treatment (model-flagged customers receive retention intervention) and control (business-as-usual, no model-informed intervention). This isolates the model’s impact from other factors like seasonal effects or marketing campaigns.
- Stratify the randomization. Ensure both groups have similar distributions of churn risk, contract type, tenure, and revenue. If the treatment group accidentally gets more month-to-month customers, the results will be confounded.
- Define the primary metric before starting. The primary metric should be customer retention rate (or inversely, churn rate) measured 90 days after the experiment starts. Secondary metrics: revenue retained, customer lifetime value, cost per retained customer. Define these upfront to avoid p-hacking after seeing results.
- Account for the cost of intervention. If the retention team calls 500 flagged customers and offers 25,000. The model is only valuable if the retained revenue exceeds the intervention cost. Calculate the ROI: (retained_customers x average_lifetime_value - intervention_cost) / intervention_cost.
- Run the test long enough. Churn is a slow process. A 2-week A/B test will not capture the full effect. I would run for at least 90 days to observe whether flagged-and-contacted customers actually stay, or if the intervention merely delayed their departure by a few weeks.
- Watch for interference effects. If retained customers talk to their friends in the control group, or if the retention team’s capacity is limited and they start prioritizing, the treatment effect can leak between groups. Use well-separated cohorts if possible.