Project Timeline
6 Months
Academic Sprint
Data Source
Kaggle Dataset
10k Records
Primary Model
XGBoost
Cost-Sensitive
Target Metric
PR-AUC
Over Accuracy
The Problem Statement
Customer churn prediction in enterprise environments suffers from a fundamental Signal Fidelity Problem. Current baseline models treat every customer equally, leading to a high rate of false negatives for rare, high-value churners.
This project addresses three critical gaps in standard academic churn models:
1Class Imbalance
Standard models optimize for global accuracy, ignoring the ~9:1 imbalance ratio where missing a churner is highly costly.
2Label Noise
Raw data contains "Ghost Signals" (e.g., active status despite zero usage), confusing gradient descent during training.
3The Action Gap
Raw probabilities lack business context. A 70% risk for an enterprise is vastly different from a 70% risk for a free user.
