Uplift Modeling and Causal Inference

Uplift modeling (also known as CATE — Conditional Average Treatment Effect) aims to predict the incremental impact of an action (the “treatment”) on an individual’s behavioral outcome.

In this tutorial, we will use the Hillstrom (MineThatData) dataset, a standard benchmark in marketing analytics, to demonstrate how to use Perpetual’s causal inference tools:

  • UpliftBooster (R-Learner)

  • SLearner

  • TLearner

  • XLearner

  • DRLearner (Doubly Robust)

[ ]:
import matplotlib.pyplot as plt
from perpetual.causal_metrics import auuc, cumulative_gain_curve, qini_coefficient
from perpetual.meta_learners import DRLearner, SLearner, TLearner, XLearner
from perpetual.uplift import UpliftBooster
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

1. Load the Dataset

The Hillstrom dataset contains 64,000 customers who were randomly assigned to one of three groups:

  1. E-mail for Mens merchandise.

  2. E-mail for Womens merchandise.

  3. No e-mail (Control group).

We will simplify this to a binary case: E-mail (any) vs. No E-mail.

[ ]:
print("Fetching dataset...")
dataset = fetch_openml(data_id=41473, as_frame=True, parser="auto")
df = dataset.frame
df.head()
[ ]:
# Preprocessing
# Create binary treatment indicator (segment != 'No E-mail')
df["treatment"] = (df["segment"] != "No E-mail").astype(int)

# Target variable: purchase (binary) or visit (binary)
y = df["visit"].astype(int)
w = df["treatment"].astype(int)

# Select features
features = [
    "recency",
    "history_segment",
    "history",
    "mens",
    "womens",
    "zip_code",
    "newbie",
    "channel",
]
X = df[features].copy()

# Handle categorical features for Perpetual (automatic or categorical type)
for col in ["history_segment", "zip_code", "channel"]:
    X[col] = X[col].astype("category")

X_train, X_test, w_train, w_test, y_train, y_test = train_test_split(
    X, w, y, test_size=0.3, random_state=42
)
print(f"Training set size: {X_train.shape[0]}")

2. R-Learner (UpliftBooster)

The UpliftBooster uses the R-Learner meta-algorithm, which is highly robust to selection bias and effectively optimizes the residual-on-residual loss.

[ ]:
# Initialize and fit UpliftBooster
ub = UpliftBooster(outcome_budget=0.1, propensity_budget=0.01, effect_budget=0.1)
ub.fit(X_train, w_train, y_train)

# Predicted Treatment Effect
uplift_r = ub.predict(X_test)
print(f"Average Predicted Uplift (R-Learner): {uplift_r.mean():.4f}")

2.1 Domain Knowledge: Interaction Constraints

Perpetual allows you to enforce Interaction Constraints. This is useful when you know (from domain expertise) that certain features should only interact with each other, or should not interact at all.

For example, we might want to allow interactions only within a specific set of features.

[ ]:
# Enforce that 'recency' and 'history' can interact, but other features cannot interact with them
# Feature indices in 'features' list: 0: recency, 2: history
interaction_constraints = [[0, 2]]
ub_constrained = UpliftBooster(
    outcome_budget=0.1,
    propensity_budget=0.01,
    effect_budget=0.1,
    interaction_constraints=interaction_constraints,
)
ub_constrained.fit(X_train, w_train, y_train)

uplift_constrained = ub_constrained.predict(X_test)
print(f"Average Uplift (Constrained): {uplift_constrained.mean():.4f}")

3. Comparing with Meta-Learners

Meta-learners are algorithms that decompose the causal problem into one or more supervised learning problems.

[ ]:
# S-Learner: Single model with treatment as feature
sl = SLearner(budget=0.2)
sl.fit(X_train, w_train, y_train)
uplift_s = sl.predict(X_test)

# T-Learner: Two models (one per treatment group)
tl = TLearner(budget=0.2)
tl.fit(X_train, w_train, y_train)
uplift_t = tl.predict(X_test)

# X-Learner: Two-stage learner with imputation
xl = XLearner(budget=0.2)
xl.fit(X_train, w_train, y_train)
uplift_x = xl.predict(X_test)

# DR-Learner: Doubly Robust / AIPW
dr = DRLearner(budget=0.2, clip=0.01)
dr.fit(X_train, w_train, y_train)
uplift_dr = dr.predict(X_test)

print(f"Avg Uplift S:  {uplift_s.mean():.4f}")
print(f"Avg Uplift T:  {uplift_t.mean():.4f}")
print(f"Avg Uplift X:  {uplift_x.mean():.4f}")
print(f"Avg Uplift DR: {uplift_dr.mean():.4f}")

4. Evaluation: Uplift Curve

Since we don’t know the “ground truth” individual effect, we use the Cumulative Gain (Uplift) curve to evaluate performance.

[ ]:
# --- Uplift Gain Curves ---
plt.figure(figsize=(10, 6))
for label, scores in [
    ("R-Learner", uplift_r),
    ("X-Learner", uplift_x),
    ("DR-Learner", uplift_dr),
]:
    fracs, gains = cumulative_gain_curve(y_test, w_test, scores)
    plt.plot(fracs, gains, label=label)

plt.plot([0, 1], [0, 0], "k--", label="Random")
plt.title("Cumulative Uplift Gain — Hillstrom Dataset")
plt.xlabel("Population % Sorted by Predicted Uplift")
plt.ylabel("Cumulative Gain")
plt.legend()
plt.show()

# --- AUUC & Qini ---
for label, scores in [
    ("R-Learner", uplift_r),
    ("S-Learner", uplift_s),
    ("T-Learner", uplift_t),
    ("X-Learner", uplift_x),
    ("DR-Learner", uplift_dr),
]:
    a = auuc(y_test, w_test, scores, normalize=True)
    q = qini_coefficient(y_test, w_test, scores)
    print(f"{label:12s}  AUUC={a:+.4f}  Qini={q:+.4f}")