GSV Logo

The Churn Project

An Open-Source Product from GSV Vadodara

Project Timeline

6 Months

Academic Sprint

Data Source

Kaggle Dataset

10k Records

Primary Model

XGBoost

Cost-Sensitive

Target Metric

PR-AUC

Over Accuracy

The Problem Statement

Customer churn prediction in enterprise environments suffers from a fundamental Signal Fidelity Problem. Current baseline models treat every customer equally, leading to a high rate of false negatives for rare, high-value churners.

This project addresses three critical gaps in standard academic churn models:

1Class Imbalance

Standard models optimize for global accuracy, ignoring the ~9:1 imbalance ratio where missing a churner is highly costly.

2Label Noise

Raw data contains "Ghost Signals" (e.g., active status despite zero usage), confusing gradient descent during training.

3The Action Gap

Raw probabilities lack business context. A 70% risk for an enterprise is vastly different from a 70% risk for a free user.