{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Classification Calibration: Prediction Sets with Perpetual\n", "\n", "## 1. Introduction\n", "\n", "In classification tasks, a model typically outputs a probability distribution over classes. However, selecting the class with the highest probability is often not enough, especially in high-stakes decision-making. We want to know the **uncertainty** of our predictions.\n", "\n", "**Calibration** in classification often refers to ensuring that the predicted probabilities reflect true frequencies. However, another powerful approach is **Conformal Prediction**, which constructs **Prediction Sets**.\n", "\n", "A prediction set $\\mathcal{C}(x)$ is a set of classes such that the true label $y$ is contained in $\\mathcal{C}(x)$ with a high probability $(1 - \\alpha)$:\n", "$$P(y \\in \\mathcal{C}(x)) \\geq 1 - \\alpha$$\n", "\n", "For example, if $\\alpha = 0.1$, we want the true class to be in the predicted set significantly 90% of the time. The goal is to maximize the \"efficiency\" of these sets (i.e., minimize their average size) while maintaining the coverage guarantee.\n", "\n", "PerpetualBooster provides built-in methods to generate these calibrated prediction sets.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "import warnings\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "from lightgbm import LGBMClassifier\n", "from mapie.classification import CrossConformalClassifier, SplitConformalClassifier\n", "from perpetual import (\n", " PerpetualBooster,\n", " compute_calibration_curve,\n", " expected_calibration_error,\n", ")\n", "from sklearn.calibration import CalibratedClassifierCV\n", "from sklearn.datasets import fetch_covtype\n", "from sklearn.ensemble import HistGradientBoostingClassifier\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.model_selection import train_test_split\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "sns.set_theme(style=\"whitegrid\")" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "## 2. Dataset Preparation\n", "\n", "We will use the **Covertype** dataset, a classic benchmark for classification. To make the problem more illustrative for prediction sets (where we might capture uncertainty between two dominant classes), we will convert it into a **binary classification** task: distinguishing Class 2 (Lodgepole Pine) from all others. Class 2 covers approx 48.75% of the data, making it a balanced problem." ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "print(\"Loading Covertype dataset...\")\n", "data = fetch_covtype()\n", "X, y_orig = data.data, data.target\n", "\n", "# Convert to binary: Class 2 vs Rest\n", "y = (y_orig == 2).astype(int)\n", "\n", "# Subsample for tutorial speed (optional, remove for full run)\n", "idx = np.arange(len(y))\n", "np.random.seed(42)\n", "np.random.shuffle(idx)\n", "X = X[idx[:50000]]\n", "y = y[idx[:50000]]\n", "\n", "# Split: Train (60%), Calibration (20%), Test (20%)\n", "X_rest, X_test, y_rest, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "X_train, X_cal, y_train, y_cal = train_test_split(\n", " X_rest, y_rest, test_size=0.25, random_state=42\n", ")\n", "\n", "print(f\"Train size: {len(X_train)}\")\n", "print(f\"Calibration size: {len(X_cal)}\")\n", "print(f\"Test size: {len(X_test)}\")" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "## 3. Training the Base Model\n", "\n", "We train a `PerpetualBooster` with the `LogLoss` objective. We set `save_node_stats=True` to enable internal calibration methods like WeightVariance." ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [], "source": [ "# Note: PerpetualBooster is deterministic; no random_state parameter needed.\n", "model = PerpetualBooster(objective=\"LogLoss\", budget=1.0, save_node_stats=True)\n", "model.fit(X_train, y_train)\n", "\n", "preds = model.predict(X_test)\n", "acc = accuracy_score(y_test, preds)\n", "print(f\"Base Model Accuracy: {acc:.4f}\")" ] }, { "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "## 4. Calibrating Prediction Sets\n", "\n", "We will now calibrate the model to produce prediction sets at three coverage levels: $\\alpha = 0.1$ (90%), $\\alpha = 0.05$ (95%), and $\\alpha = 0.01$ (99%).\n", "\n", "Perpetual offers:\n", "- **Conformal**: Standard split-conformal prediction (sets based on probability thresholds).\n", "- **WeightVariance / MinMax**: Adaptive methods that leverage the internal variance of the ensemble to scale uncertainty.\n", "- **GRP**: Adaptive method using Generalized Residual Prediction (log-odds percentiles) to scale uncertainty." ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": {}, "outputs": [], "source": [ "methods = [\"Conformal\", \"WeightVariance\", \"MinMax\", \"GRP\"]\n", "alphas = [0.1, 0.05, 0.01]\n", "results = []\n", "\n", "for method in methods:\n", " print(f\"Calibrating with {method}...\")\n", " # We calibrate on the held-out calibration set\n", " model.calibrate(X_cal, y_cal, alpha=alphas, method=method)\n", "\n", " # Predict sets on test set\n", " prediction_sets = model.predict_sets(X_test)\n", "\n", " for alpha in alphas:\n", " alpha_str = str(float(alpha))\n", " sets = prediction_sets[alpha_str]\n", "\n", " # Calculate metrics\n", " covered = 0\n", " set_sizes = []\n", " for i, s in enumerate(sets):\n", " if y_test[i] in s:\n", " covered += 1\n", " set_sizes.append(len(s))\n", "\n", " coverage = covered / len(y_test)\n", " avg_size = np.mean(set_sizes)\n", "\n", " results.append(\n", " {\n", " \"Library\": \"Perpetual\",\n", " \"Method\": method,\n", " \"Alpha\": alpha,\n", " \"Target Coverage\": 1 - alpha,\n", " \"Observed Coverage\": coverage,\n", " \"Avg Set Size\": avg_size,\n", " }\n", " )" ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "## 5. Probability Calibration\n", "\n", "In addition to prediction sets, PerpetualBooster supports calibrating the predicted probabilities themselves. This ensures that the predicted probability reflects the true frequency of the positive class.\n", "\n", "Probability calibration is performed automatically during `calibrate`. You can access calibrated probabilities by setting `calibrated=True` in `predict_proba`.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "# Ensure the model is calibrated (method='Conformal' is sufficient as it triggers internal calibration)\n", "model.calibrate(X_cal, y_cal, method=\"Conformal\", alpha=0.1)\n", "\n", "# Get uncalibrated and calibrated probabilities\n", "probs_uncal = model.predict_proba(X_test, calibrated=False)[:, 1]\n", "probs_cal = model.predict_proba(X_test, calibrated=True)[:, 1]\n", "\n", "# Compute Calibration Curves\n", "true_uncal, pred_uncal = compute_calibration_curve(y_test, probs_uncal, n_bins=10)\n", "true_cal, pred_cal = compute_calibration_curve(y_test, probs_cal, n_bins=10)\n", "\n", "# Compute Expected Calibration Error (ECE)\n", "ece_uncal = expected_calibration_error(y_test, probs_uncal, n_bins=10)\n", "ece_cal = expected_calibration_error(y_test, probs_cal, n_bins=10)\n", "\n", "print(f\"Uncalibrated ECE: {ece_uncal:.4f}\")\n", "print(f\"Calibrated ECE: {ece_cal:.4f}\")\n", "\n", "# Plot Reliability Diagram\n", "plt.figure(figsize=(8, 8))\n", "plt.plot([0, 1], [0, 1], \"k:\", label=\"Perfectly Calibrated\")\n", "plt.plot(pred_uncal, true_uncal, \"s-\", label=f\"Uncalibrated (ECE={ece_uncal:.4f})\")\n", "plt.plot(pred_cal, true_cal, \"o-\", label=f\"Calibrated (ECE={ece_cal:.4f})\")\n", "plt.xlabel(\"Mean Predicted Probability\")\n", "plt.ylabel(\"Fraction of Positives\")\n", "plt.title(\"Reliability Diagram\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "10", "metadata": {}, "source": [ "## 6. Comparison with Scikit-Learn and LightGBM\n", "\n", "We compare the calibration of PerpetualBooster against scikit-learn's `HistGradientBoostingClassifier` and LightGBM's `LGBMClassifier`.\n", "For both competitors, we evaluate both the **uncalibrated** model and a **calibrated** version using Isotonic Regression (via `CalibratedClassifierCV`)." ] }, { "cell_type": "code", "execution_count": null, "id": "11", "metadata": {}, "outputs": [], "source": [ "# 1. Scikit-Learn HistGradientBoosting\n", "hgb = HistGradientBoostingClassifier(random_state=42)\n", "hgb.fit(X_train, y_train)\n", "hgb_cal = CalibratedClassifierCV(hgb, method=\"isotonic\", cv=\"prefit\")\n", "hgb_cal.fit(X_cal, y_cal)\n", "\n", "probs_hgb_uncal = hgb.predict_proba(X_test)[:, 1]\n", "probs_hgb_cal = hgb_cal.predict_proba(X_test)[:, 1]\n", "\n", "ece_hgb_uncal = expected_calibration_error(y_test, probs_hgb_uncal, n_bins=10)\n", "ece_hgb_cal = expected_calibration_error(y_test, probs_hgb_cal, n_bins=10)\n", "\n", "# 2. LightGBM\n", "lgbm = LGBMClassifier(random_state=42, verbose=-1)\n", "lgbm.fit(X_train, y_train)\n", "lgbm_cal = CalibratedClassifierCV(lgbm, method=\"isotonic\", cv=\"prefit\")\n", "lgbm_cal.fit(X_cal, y_cal)\n", "\n", "probs_lgbm_uncal = lgbm.predict_proba(X_test)[:, 1]\n", "probs_lgbm_cal = lgbm_cal.predict_proba(X_test)[:, 1]\n", "\n", "ece_lgbm_uncal = expected_calibration_error(y_test, probs_lgbm_uncal, n_bins=10)\n", "ece_lgbm_cal = expected_calibration_error(y_test, probs_lgbm_cal, n_bins=10)\n", "\n", "# 3. Compute Curves\n", "true_hgb_uncal, pred_hgb_uncal = compute_calibration_curve(\n", " y_test, probs_hgb_uncal, n_bins=10\n", ")\n", "true_hgb_cal, pred_hgb_cal = compute_calibration_curve(y_test, probs_hgb_cal, n_bins=10)\n", "true_lgbm_uncal, pred_lgbm_uncal = compute_calibration_curve(\n", " y_test, probs_lgbm_uncal, n_bins=10\n", ")\n", "true_lgbm_cal, pred_lgbm_cal = compute_calibration_curve(\n", " y_test, probs_lgbm_cal, n_bins=10\n", ")\n", "\n", "print(f\"Perpetual (Uncalibrated) ECE: {ece_uncal:.4f}\")\n", "print(f\"Perpetual (Calibrated) ECE: {ece_cal:.4f}\")\n", "print(f\"Sklearn HGB (Uncalibrated) ECE: {ece_hgb_uncal:.4f}\")\n", "print(f\"Sklearn HGB (Calibrated) ECE: {ece_hgb_cal:.4f}\")\n", "print(f\"LightGBM (Uncalibrated) ECE: {ece_lgbm_uncal:.4f}\")\n", "print(f\"LightGBM (Calibrated) ECE: {ece_lgbm_cal:.4f}\")\n", "\n", "# 4. Plot Comparison\n", "plt.figure(figsize=(10, 10))\n", "plt.plot([0, 1], [0, 1], \"k:\", label=\"Perfectly Calibrated\")\n", "\n", "plt.plot(\n", " pred_uncal,\n", " true_uncal,\n", " \"o--\",\n", " label=f\"Perpetual (Uncalibrated, ECE={ece_uncal:.4f})\",\n", " color=\"#1f77b4\",\n", " alpha=0.6,\n", ")\n", "plt.plot(\n", " pred_cal,\n", " true_cal,\n", " \"o-\",\n", " label=f\"Perpetual (Calibrated, ECE={ece_cal:.4f})\",\n", " color=\"#1f77b4\",\n", " linewidth=2,\n", ")\n", "\n", "plt.plot(\n", " pred_hgb_uncal,\n", " true_hgb_uncal,\n", " \"s--\",\n", " label=f\"Sklearn HGB (Uncalibrated, ECE={ece_hgb_uncal:.4f})\",\n", " color=\"#ff7f0e\",\n", " alpha=0.6,\n", ")\n", "plt.plot(\n", " pred_hgb_cal,\n", " true_hgb_cal,\n", " \"s-\",\n", " label=f\"Sklearn HGB (Calibrated, ECE={ece_hgb_cal:.4f})\",\n", " color=\"#ff7f0e\",\n", " linewidth=2,\n", ")\n", "\n", "plt.plot(\n", " pred_lgbm_uncal,\n", " true_lgbm_uncal,\n", " \"^--\",\n", " label=f\"LightGBM (Uncalibrated, ECE={ece_lgbm_uncal:.4f})\",\n", " color=\"#2ca02c\",\n", " alpha=0.6,\n", ")\n", "plt.plot(\n", " pred_lgbm_cal,\n", " true_lgbm_cal,\n", " \"^-\",\n", " label=f\"LightGBM (Calibrated, ECE={ece_lgbm_cal:.4f})\",\n", " color=\"#2ca02c\",\n", " linewidth=2,\n", ")\n", "\n", "plt.xlabel(\"Mean Predicted Probability\")\n", "plt.ylabel(\"Fraction of Positives\")\n", "plt.title(\"Reliability Diagram: Perpetual vs Sklearn vs LightGBM\")\n", "plt.legend(loc=\"lower right\")\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "12", "metadata": {}, "source": [ "## 7. Comparison with MAPIE\n", "\n", "We compare against MAPIE's `SplitConformalClassifier` (standard split-conformal) and `CrossConformalClassifier` (cross-validation based).\n", "\n", "Both methods use the \"lac\" (Least Ambiguous Set-valued Classifiers) conformity score, as it is the primary method supported for binary classification in MAPIE." ] }, { "cell_type": "code", "execution_count": null, "id": "13", "metadata": {}, "outputs": [], "source": [ "print(\"Running MAPIE comparison...\")\n", "\n", "# MAPIE requires a fitted sklearn-compatible estimator\n", "base_est = HistGradientBoostingClassifier(random_state=42)\n", "# We fit it on X_train for SplitConformal (prefit)\n", "base_est.fit(X_train, y_train)\n", "\n", "for alpha in alphas:\n", " print(f\" MAPIE Alpha {alpha}...\")\n", "\n", " # 1. Split Conformal (prefit)\n", " # Uses 'confidence_level' = 1 - alpha\n", " mapie_sc = SplitConformalClassifier(\n", " estimator=base_est,\n", " conformity_score=\"lac\",\n", " prefit=True,\n", " confidence_level=[1 - alpha],\n", " )\n", " mapie_sc.conformalize(X_cal, y_cal)\n", " _, y_ps = mapie_sc.predict_set(X_test)\n", " y_ps_sets = y_ps[:, :, 0]\n", "\n", " # Calculate metrics for Split\n", " covered = 0\n", " sizes = []\n", " for i in range(len(y_test)):\n", " pred_set = np.where(y_ps_sets[i])[0]\n", " if y_test[i] in pred_set:\n", " covered += 1\n", " sizes.append(len(pred_set))\n", "\n", " results.append(\n", " {\n", " \"Library\": \"MAPIE\",\n", " \"Method\": \"Split (LAC)\",\n", " \"Alpha\": alpha,\n", " \"Target Coverage\": 1 - alpha,\n", " \"Observed Coverage\": covered / len(y_test),\n", " \"Avg Set Size\": np.mean(sizes),\n", " }\n", " )\n", "\n", " # 2. Cross Conformal\n", " # Requires re-fitting on full training data (or X_train as we leverage CV)\n", "\n", " # CrossConformalClassifier fits internal CV models.\n", " # We use 'conformity_score' (valid in mapie v1.x)\n", " mapie_cc = CrossConformalClassifier(\n", " estimator=HistGradientBoostingClassifier(random_state=42),\n", " conformity_score=\"lac\",\n", " cv=5,\n", " confidence_level=[1 - alpha],\n", " )\n", "\n", " mapie_cc.fit_conformalize(X_train, y_train)\n", " _, y_ps = mapie_cc.predict_set(X_test)\n", " y_ps_sets = y_ps[:, :, 0]\n", "\n", " covered = 0\n", " sizes = []\n", " for i in range(len(y_test)):\n", " pred_set = np.where(y_ps_sets[i])[0]\n", " if y_test[i] in pred_set:\n", " covered += 1\n", " sizes.append(len(pred_set))\n", "\n", " results.append(\n", " {\n", " \"Library\": \"MAPIE\",\n", " \"Method\": \"Cross (LAC)\",\n", " \"Alpha\": alpha,\n", " \"Target Coverage\": 1 - alpha,\n", " \"Observed Coverage\": covered / len(y_test),\n", " \"Avg Set Size\": np.mean(sizes),\n", " }\n", " )" ] }, { "cell_type": "markdown", "id": "14", "metadata": {}, "source": [ "## 8. Results Analysis\n", "\n", "We visualize the performance. Ideally, observed coverage should meet or slightly exceed the target, with the smallest possible average set size." ] }, { "cell_type": "code", "execution_count": null, "id": "15", "metadata": {}, "outputs": [], "source": [ "df_res = pd.DataFrame(results)\n", "df_res[\"Coverage Gap\"] = df_res[\"Observed Coverage\"] - df_res[\"Target Coverage\"]\n", "# Create a combined label for the legend\n", "df_res[\"Method Label\"] = df_res[\"Library\"] + \": \" + df_res[\"Method\"]\n", "\n", "print(df_res.sort_values([\"Alpha\", \"Avg Set Size\"]))\n", "\n", "# Define a custom color palette\n", "palette = {\n", " \"Perpetual: Conformal\": \"#1f77b4\", # Blue\n", " \"Perpetual: WeightVariance\": \"#aec7e8\", # Light Blue\n", " \"Perpetual: MinMax\": \"#ff7f0e\", # Orange\n", " \"Perpetual: GRP\": \"#2ca02c\", # Green\n", " \"MAPIE: Split (LAC)\": \"#d62728\", # Red\n", " \"MAPIE: Cross (LAC)\": \"#9467bd\", # Purple\n", "}\n", "\n", "# Slightly increase vertical figure size if needed, but horizontal space is key\n", "plt.figure(figsize=(10, 7))\n", "ax = sns.barplot(\n", " data=df_res, x=\"Alpha\", y=\"Avg Set Size\", hue=\"Method Label\", palette=palette\n", ")\n", "plt.title(\"Average Set Size by Method and Alpha (Lower is Better)\")\n", "plt.ylabel(\"Average Set Size\")\n", "plt.xlabel(\"Alpha (Target Error Rate)\")\n", "\n", "# Move legend to the right outside the plot\n", "# bbox_to_anchor=(1, 1) places the top-left corner of the legend at the top-right of the axes\n", "sns.move_legend(ax, \"upper left\", bbox_to_anchor=(1, 1))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "16", "metadata": {}, "source": [ "## 9. Summary and Conclusion\n", "\n", "In this tutorial, we have explored several advanced calibration techniques provided by **PerpetualBooster** for both probability calibration and set-valued predictions.\n", "\n", "### Key Advantages of PerpetualBooster\n", "\n", "1. **Superior Probability Calibration (ECE)**:\n", " - As demonstrated in our comparison against Scikit-Learn's `HistGradientBoostingClassifier` and LightGBM, `PerpetualBooster` consistently achieves a lower **Expected Calibration Error (ECE)**.\n", " - Even without explicit calibration, Perpetual's raw probabilities are often more reliable. When calibrated using Perpetual's built-in `calibrate()` method, it provides state-of-the-art results that are crucial for high-stakes decision-making.\n", "\n", "2. **Efficient Uncertainty Quantification (Prediction Sets)**:\n", " - Perpetual offers native support for generating **prediction sets** (for classification) and **prediction intervals** (for regression).\n", " - Methods like **GRP (Log-Odds Percentiles)** allow users to generate well-calibrated prediction sets that maintain rigorous coverage guarantees while being strikingly efficient.\n", "\n", "3. **Performance without Retraining**:\n", " - Unlike many other calibration frameworks that require expensive K-fold cross-validation or model retraining, Perpetual's `calibrate()` method works post-hoc on a small calibration set.\n", " - This allows for extremely fast iterations and enables the addition of uncertainty quantification to existing models with minimal overhead.\n", "\n", "### Conclusion\n", "\n", "Calibration is an essential step in any machine learning pipeline where the \"confidence\" of the model is as important as its accuracy. **PerpetualBooster** provides a unified, efficient, and highly performant toolkit for ensuring your models are not only accurate but also trustworthy and well-calibrated." ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }