{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Policy Learning: Optimal Treatment Assignment\n", "\n", "Policy learning aims to find the **optimal treatment rule**\n", "$\\pi(X) \\in \\{0, 1\\}$ that maximizes expected outcomes in a population.\n", "\n", "Perpetual's `PolicyLearner` implements the Athey & Wager (2021) framework\n", "using **inverse propensity weighting** (IPW) to transform the causal\n", "inference problem into a weighted classification task.\n", "\n", "Two modes are available:\n", "- **IPW** — standard inverse propensity weighting.\n", "- **AIPW** (doubly robust) — incorporates a baseline outcome model to\n", " reduce variance.\n", "\n", "In this tutorial we use the **Bank Marketing** dataset to learn an optimal\n", "policy for targeting customers with a marketing campaign." ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from perpetual.causal_metrics import auuc, qini_coefficient\n", "from perpetual.policy import PolicyLearner\n", "from sklearn.datasets import fetch_openml\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "## 1. Load the Bank Marketing Dataset\n", "\n", "The Bank Marketing dataset records whether clients subscribed to a term\n", "deposit after being contacted in a marketing campaign. We treat the\n", "phone contact as the \"treatment\" and subscription as the \"outcome\"." ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "print(\"Fetching Bank Marketing dataset...\")\n", "data = fetch_openml(data_id=1461, as_frame=True, parser=\"auto\")\n", "df = data.frame\n", "print(f\"Shape: {df.shape}\")\n", "df.head()" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "## 2. Simulate an RCT\n", "\n", "Since the original dataset is observational, we simulate a randomized\n", "controlled trial (RCT) by randomly assigning treatment. This gives us\n", "known propensity scores (0.5) for clean evaluation." ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "\n", "# Encode target\n", "y_col = df.columns[-1]\n", "y_all = (\n", " (df[y_col].astype(str).str.strip().isin([\"1\", \"2\", \"yes\", \"Yes\"]))\n", " .astype(float)\n", " .values\n", ")\n", "\n", "# Encode features\n", "feature_cols = [c for c in df.columns if c != y_col]\n", "df_feat = df[feature_cols].copy()\n", "cat_cols = df_feat.select_dtypes(include=[\"category\", \"object\"]).columns.tolist()\n", "df_encoded = pd.get_dummies(df_feat, columns=cat_cols, drop_first=True, dtype=float)\n", "X_all = df_encoded.values.astype(float)\n", "feature_names = list(df_encoded.columns)\n", "\n", "# Simulate RCT: random treatment with P(W=1)=0.5\n", "w_all = np.random.binomial(1, 0.5, size=len(y_all)).astype(float)\n", "\n", "# Simulate heterogeneous treatment effect based on age and balance\n", "age_idx = feature_names.index(\"age\") if \"age\" in feature_names else 0\n", "balance_idx = feature_names.index(\"balance\") if \"balance\" in feature_names else 1\n", "\n", "# Younger clients with higher balance benefit more from contact\n", "age_norm = (X_all[:, age_idx] - X_all[:, age_idx].mean()) / (\n", " X_all[:, age_idx].std() + 1e-8\n", ")\n", "bal_norm = (X_all[:, balance_idx] - X_all[:, balance_idx].mean()) / (\n", " X_all[:, balance_idx].std() + 1e-8\n", ")\n", "true_cate = 0.1 - 0.05 * age_norm + 0.05 * bal_norm\n", "\n", "# Outcome: base rate + treatment effect\n", "base_rate = 0.12\n", "prob_y = np.clip(base_rate + w_all * true_cate, 0.01, 0.99)\n", "y_sim = np.random.binomial(1, prob_y).astype(float)\n", "\n", "print(f\"X shape: {X_all.shape}\")\n", "print(f\"Treatment rate: {w_all.mean():.2%}\")\n", "print(f\"Outcome rate: {y_sim.mean():.2%}\")\n", "print(f\"True ATE: {true_cate.mean():.4f}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "X_train, X_test, w_train, w_test, y_train, y_test, cate_train, cate_test = (\n", " train_test_split(X_all, w_all, y_sim, true_cate, test_size=0.3, random_state=42)\n", ")\n", "print(f\"Train: {X_train.shape[0]}, Test: {X_test.shape[0]}\")" ] }, { "cell_type": "markdown", "id": "7", "metadata": {}, "source": [ "## 3. Learn an IPW Policy\n", "\n", "The IPW `PolicyLearner` computes pseudo-outcomes from the propensity\n", "scores and outcome, then learns a policy that assigns treatment when\n", "the pseudo-outcome is positive." ] }, { "cell_type": "code", "execution_count": null, "id": "8", "metadata": {}, "outputs": [], "source": [ "# Since we simulated an RCT, we know the propensity is 0.5 for all samples.\n", "prop_train = np.full(len(w_train), 0.5)\n", "\n", "pl_ipw = PolicyLearner(budget=0.5, mode=\"ipw\")\n", "pl_ipw.fit(X_train, w_train, y_train, propensity=prop_train)\n", "\n", "policy_ipw = pl_ipw.predict(X_test)\n", "print(f\"IPW Policy: treat {policy_ipw.mean():.2%} of the population\")" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "## 4. Learn an AIPW (Doubly Robust) Policy\n", "\n", "AIPW reduces variance by incorporating a baseline outcome model." ] }, { "cell_type": "code", "execution_count": null, "id": "10", "metadata": {}, "outputs": [], "source": [ "pl_aipw = PolicyLearner(budget=0.5, mode=\"aipw\")\n", "pl_aipw.fit(X_train, w_train, y_train, propensity=prop_train)\n", "\n", "policy_aipw = pl_aipw.predict(X_test)\n", "print(f\"AIPW Policy: treat {policy_aipw.mean():.2%} of the population\")" ] }, { "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "## 5. Evaluate Policies\n", "\n", "We compare the learned policies against random treatment and\n", "treat-everyone baselines using the true CATE." ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": {}, "outputs": [], "source": [ "def policy_value(policy, true_cate):\n", " \"\"\"Expected improvement from a targeting policy vs. no treatment.\"\"\"\n", " return (policy * true_cate).mean()\n", "\n", "\n", "# Baselines\n", "treat_all = np.ones(len(cate_test))\n", "treat_none = np.zeros(len(cate_test))\n", "random_policy = np.random.binomial(1, 0.5, size=len(cate_test))\n", "\n", "# Oracle: treat only when CATE > 0\n", "oracle = (cate_test > 0).astype(int)\n", "\n", "print(f\"{'Policy':<25} {'Value':>10} {'Treat %':>10}\")\n", "print(\"-\" * 48)\n", "for name, pol in [\n", " (\"Treat Everyone\", treat_all),\n", " (\"Treat Nobody\", treat_none),\n", " (\"Random (50%)\", random_policy),\n", " (\"IPW PolicyLearner\", policy_ipw),\n", " (\"AIPW PolicyLearner\", policy_aipw),\n", " (\"Oracle (true CATE>0)\", oracle),\n", "]:\n", " val = policy_value(pol, cate_test)\n", " pct = pol.mean()\n", " print(f\"{name:<25} {val:>10.4f} {pct:>10.2%}\")" ] }, { "cell_type": "markdown", "id": "13", "metadata": {}, "source": [ "## 6. Feature Importance\n", "\n", "Which features matter most for the treatment-assignment decision?" ] }, { "cell_type": "code", "execution_count": null, "id": "14", "metadata": {}, "outputs": [], "source": [ "importances = pl_aipw.feature_importances_\n", "top_k = 10\n", "top_idx = np.argsort(importances)[::-1][:top_k]\n", "\n", "print(f\"Top {top_k} features for policy assignment:\")\n", "for rank, idx in enumerate(top_idx, 1):\n", " print(f\" {rank}. {feature_names[idx]:25s} importance={importances[idx]:.4f}\")" ] }, { "cell_type": "markdown", "id": "15", "metadata": {}, "source": [ "## 7. Uplift Curve Evaluation\n", "\n", "We can also use `causal_metrics` to evaluate the policy score as an\n", "uplift model, checking whether higher-scored individuals benefit more." ] }, { "cell_type": "code", "execution_count": null, "id": "16", "metadata": {}, "outputs": [], "source": [ "# Use the continuous policy score as the uplift score\n", "uplift_score = pl_aipw.predict_proba(X_test)\n", "\n", "aipw_auuc = auuc(y_test, w_test, uplift_score)\n", "aipw_qini = qini_coefficient(y_test, w_test, uplift_score)\n", "\n", "# Compare with random\n", "random_score = np.random.randn(len(y_test))\n", "rand_auuc = auuc(y_test, w_test, random_score)\n", "\n", "print(f\"AIPW PolicyLearner — AUUC: {aipw_auuc:.4f}, Qini: {aipw_qini:.4f}\")\n", "print(f\"Random baseline — AUUC: {rand_auuc:.4f}\")" ] }, { "cell_type": "markdown", "id": "17", "metadata": {}, "source": [ "## Summary\n", "\n", "In this tutorial we:\n", "\n", "1. Simulated an RCT with **heterogeneous treatment effects** on real-world Bank Marketing features.\n", "2. Trained **IPW** and **AIPW** policy learners to find optimal treatment rules.\n", "3. Compared learned policies against baselines and the oracle.\n", "4. Identified the most important features for treatment assignment.\n", "5. Evaluated the policy as an uplift model using AUUC and Qini.\n", "\n", "### References\n", "\n", "- Athey, S. & Wager, S. (2021). *Policy Learning with Observational Data*.\n", " Econometrica, 89(1), 133-161." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }