Customer Retention: Uplift Modeling for Churn Prevention
Customer churn is a critical problem in subscription businesses (telecom, SaaS, banking). A common intervention is a retention offer (discount, loyalty reward, personal call). However, not every customer benefits equally from such an offer:
Persuadables — would churn without the offer but stay if treated. Target these.
Sure Things — will stay regardless. Waste of budget.
Lost Causes — will churn regardless. Waste of budget.
Sleeping Dogs — will stay without contact but churn if contacted. Avoid these!
Uplift modeling identifies the persuadables by estimating the Conditional Average Treatment Effect (CATE) of the retention intervention on each customer.
This tutorial demonstrates a full churn uplift pipeline:
Simulate a retention campaign dataset.
Estimate CATE with S-Learner, T-Learner, X-Learner, DR-Learner, and R-Learner.
Build a targeting policy and measure incremental revenue.
Evaluate with uplift curves, AUUC, and Qini.
Note: We use the Bank Marketing dataset from UCI/OpenML as a realistic customer base and simulate a retention RCT on top of it.
[ ]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from perpetual.causal_metrics import auuc, cumulative_gain_curve, qini_coefficient
from perpetual.meta_learners import DRLearner, SLearner, TLearner, XLearner
from perpetual.uplift import UpliftBooster
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
1. Prepare a Churn Retention Dataset
We use the Bank Marketing dataset (OpenML ID 1461) which records whether clients subscribed to a term deposit after a marketing campaign. We re-frame this as a churn-prevention scenario:
Outcome :math:`Y`: whether the customer was retained (subscribed).
Treatment :math:`W`: whether the customer received a targeted retention call (simulated RCT).
Since the original dataset is observational, we construct a clean RCT by randomly assigning treatment and simulating heterogeneous response.
[ ]:
data = fetch_openml(data_id=1461, as_frame=True, parser="auto")
df = data.frame
print(f"Raw samples: {len(df):,}")
df.head()
[ ]:
# Select features (drop the original outcome and campaign-related cols)
feature_cols = [
"age",
"job",
"marital",
"education",
"default",
"balance",
"housing",
"loan",
]
X = df[feature_cols].copy()
# Mark categoricals
for c in X.select_dtypes(include=["object", "category"]).columns:
X[c] = X[c].astype("category")
n = len(X)
rng = np.random.default_rng(42)
# --- Simulate an RCT ---
# Random treatment assignment (50/50)
w = rng.binomial(1, 0.5, size=n)
# Baseline churn probability (higher for young, low-balance customers)
age_norm = (df["age"].astype(float).values - 30) / 30
balance_norm = (df["balance"].astype(float).values - 1000) / 5000
base_logit = -0.5 - 0.3 * age_norm + 0.2 * balance_norm
# Heterogeneous treatment effect:
# - Young customers (age < 35) respond well to retention offers
# - Customers with housing loans also respond positively
# - High-balance customers don't need the offer ("sure things")
has_housing = (df["housing"] == "yes").astype(float).values
is_young = (df["age"].astype(float).values < 35).astype(float)
tau = 0.15 * is_young + 0.10 * has_housing - 0.08 * (balance_norm > 0.5)
# Generate outcome
prob_retain = 1 / (1 + np.exp(-(base_logit + tau * w)))
y = rng.binomial(1, prob_retain)
print(f"Treatment rate: {w.mean():.2%}")
print(f"Retention rate (treated): {y[w == 1].mean():.2%}")
print(f"Retention rate (control): {y[w == 0].mean():.2%}")
print(f"True ATE: {tau.mean():.4f}")
[ ]:
X_train, X_test, w_train, w_test, y_train, y_test, tau_test = train_test_split(
X, w, y, tau, test_size=0.3, random_state=42
)
print(f"Train: {len(X_train):,} | Test: {len(X_test):,}")
2. Estimate CATE with Multiple Learners
[ ]:
learners = {
"S-Learner": SLearner(budget=0.3),
"T-Learner": TLearner(budget=0.3),
"X-Learner": XLearner(budget=0.3),
"DR-Learner": DRLearner(budget=0.3, clip=0.01),
}
cate_preds = {}
for name, learner in learners.items():
learner.fit(X_train, w_train, y_train)
cate_preds[name] = learner.predict(X_test)
# R-Learner
rl = UpliftBooster(outcome_budget=0.1, propensity_budget=0.01, effect_budget=0.1)
rl.fit(X_train, w_train, y_train)
cate_preds["R-Learner"] = rl.predict(X_test)
for name, tau_hat in cate_preds.items():
corr = np.corrcoef(tau_test, tau_hat)[0, 1]
rmse = np.sqrt(np.mean((tau_test - tau_hat) ** 2))
print(
f"{name:12s} avg CATE = {tau_hat.mean():+.4f} "
f"corr(true) = {corr:.3f} RMSE = {rmse:.4f}"
)
3. Targeting Policy — Who Should Receive the Offer?
A targeting policy assigns treatment to customers whose predicted uplift exceeds a threshold. We use a simple rule: treat if \(\hat{\tau}(x) > 0\).
We compare three scenarios:
Treat nobody (control baseline)
Treat everybody (blanket policy)
Treat top-k by predicted CATE (uplift-based targeting)
[ ]:
# Use DR-Learner for targeting
tau_hat = cate_preds["DR-Learner"]
# Offer cost: $10 per contacted customer
# Revenue from retained customer: $100
OFFER_COST = 10
RETAIN_VALUE = 100
def evaluate_policy(treat_mask, y_obs, w_obs, tau_true):
"""Estimate incremental outcomes under a targeting policy."""
n_treated = treat_mask.sum()
# Among those targeted AND actually treated in the RCT
expected_uplift = tau_true[treat_mask].mean() if n_treated > 0 else 0
incremental_retentions = expected_uplift * n_treated
revenue = incremental_retentions * RETAIN_VALUE - n_treated * OFFER_COST
return {
"n_targeted": n_treated,
"pct_targeted": n_treated / len(y_obs),
"avg_true_cate": expected_uplift,
"est_incremental_revenue": revenue,
}
policies = {
"Treat nobody": np.zeros(len(X_test), dtype=bool),
"Treat all": np.ones(len(X_test), dtype=bool),
"Top 30%": tau_hat >= np.percentile(tau_hat, 70),
"Top 50%": tau_hat >= np.percentile(tau_hat, 50),
"CATE > 0": tau_hat > 0,
}
rows = []
for name, mask in policies.items():
res = evaluate_policy(mask, y_test, w_test, tau_test)
rows.append({"Policy": name, **res})
policy_df = pd.DataFrame(rows)
print(policy_df.to_string(index=False, float_format="%.3f"))
[ ]:
fig, ax = plt.subplots(figsize=(8, 4))
ax.barh(policy_df["Policy"], policy_df["est_incremental_revenue"])
ax.set_xlabel("Estimated Incremental Revenue ($)")
ax.set_title("Revenue by Targeting Policy")
ax.axvline(0, color="grey", linewidth=0.5)
plt.tight_layout()
plt.show()
4. Evaluation: Uplift Curves and Metrics
[ ]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Cumulative gain
for name, scores in cate_preds.items():
fracs, gains = cumulative_gain_curve(y_test, w_test, scores)
axes[0].plot(fracs, gains, label=name)
axes[0].plot([0, 1], [0, 0], "k--", label="Random")
axes[0].set_title("Cumulative Gain Curves")
axes[0].set_xlabel("Fraction Targeted")
axes[0].set_ylabel("Cumulative Gain")
axes[0].legend(fontsize=8)
# True CATE vs. predicted (DR-Learner)
axes[1].scatter(tau_test, cate_preds["DR-Learner"], alpha=0.1, s=4)
lims = [
min(tau_test.min(), cate_preds["DR-Learner"].min()),
max(tau_test.max(), cate_preds["DR-Learner"].max()),
]
axes[1].plot(lims, lims, "r--", label="Perfect")
axes[1].set_xlabel("True CATE")
axes[1].set_ylabel("Predicted CATE (DR-Learner)")
axes[1].set_title("True vs. Predicted CATE")
axes[1].legend()
plt.tight_layout()
plt.show()
[ ]:
# Summary metrics
print(
f"{'Learner':12s} {'AUUC (norm)':>12} {'Qini':>8} {'Corr(true)':>12} {'RMSE':>8}"
)
print("-" * 60)
for name, tau_hat in cate_preds.items():
a = auuc(y_test, w_test, tau_hat, normalize=True)
q = qini_coefficient(y_test, w_test, tau_hat)
corr = np.corrcoef(tau_test, tau_hat)[0, 1]
rmse = np.sqrt(np.mean((tau_test - tau_hat) ** 2))
print(f"{name:12s} {a:>+12.4f} {q:>+8.4f} {corr:>12.4f} {rmse:>8.4f}")
5. Deep Dive: Who Are the Persuadables?
We segment the test set by predicted CATE decile and inspect the demographic profile of the top segment.
[ ]:
tau_hat_dr = cate_preds["DR-Learner"]
decile = pd.qcut(tau_hat_dr, 10, labels=False, duplicates="drop")
test_df = X_test.copy()
test_df["cate_hat"] = tau_hat_dr
test_df["true_cate"] = tau_test
test_df["decile"] = decile
agg = test_df.groupby("decile").agg(
avg_cate_hat=("cate_hat", "mean"),
avg_true_cate=("true_cate", "mean"),
avg_age=("age", lambda x: x.astype(float).mean()),
avg_balance=("balance", lambda x: x.astype(float).mean()),
n=("cate_hat", "count"),
)
print(agg.to_string(float_format="%.4f"))
[ ]:
# Profile the top decile
top_decile = test_df[test_df["decile"] == test_df["decile"].max()]
print(f"Top Decile Profile (n={len(top_decile)})")
print(f" Avg predicted CATE: {top_decile['cate_hat'].mean():.4f}")
print(f" Avg true CATE: {top_decile['true_cate'].mean():.4f}")
print(f" Avg age: {top_decile['age'].astype(float).mean():.1f}")
if "job" in top_decile.columns:
print(f" Top jobs: {top_decile['job'].value_counts().head(3).to_dict()}")
if "housing" in top_decile.columns:
print(f" Housing loan: {(top_decile['housing'] == 'yes').mean():.1%}")
Key Takeaways
Insight |
Details |
|---|---|
Not everyone benefits from treatment |
Blanket campaigns waste budget on sure-things and sleeping dogs. |
Uplift-based targeting improves ROI |
Targeting the top persuadables yields higher incremental revenue than treating everyone. |
Multiple learners, one winner |
Comparing S/T/X/DR/R-Learners on AUUC and Qini helps select the best model for your data. |
Subgroup profiling |
Decile analysis reveals the demographic characteristics of persuadable customers. |
Business integration |
CATE estimates + cost/revenue parameters → actionable targeting thresholds. |