{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mastering Regression Calibration: From Theoretical Basics to Advanced Methods\n", "\n", "## 1. Introduction to Calibration\n", "\n", "In regression, we often care about more than just a single point estimate (the mean). We want to know the **uncertainty** associated with that prediction. **Calibration** is the process of ensuring that our uncertainty estimates (like prediction intervals) are \"honest.\"\n", "\n", "### What is a Prediction Interval?\n", "A prediction interval for a given target $y$ at a confidence level $(1 - \\alpha)$ is an interval $[L, U]$ such that:\n", "$$P(y \\in [L, U]) \\geq 1 - \\alpha$$\n", "\n", "If we say we have a 90% prediction interval ($\\alpha = 0.1$), we expect that in 90% of future cases, the true value will fall within those bounds.\n", "\n", "### The Need for Calibration\n", "Most machine learning models (Boosted Trees, Neural Networks, etc.) are optimized to minimize a loss function (like MSE) but do not inherently produce calibrated probability distributions or intervals. Calibration methods take a pre-trained model and \"fine-tune\" its uncertainty estimates using a held-out **Calibration Set**.\n", "\n", "In this tutorial, we will explore two families of calibration methods:\n", "1. **Perpetual-native methods**: Leverage internal statistics from the `PerpetualBooster` (leaf fold weights).\n", "2. **MAPIE methods**: Model-agnostic conformal prediction methods that work with any regressor." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import warnings\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "from mapie.regression import (\n", " ConformalizedQuantileRegressor,\n", " CrossConformalRegressor,\n", " SplitConformalRegressor,\n", ")\n", "from perpetual import PerpetualBooster\n", "from sklearn.datasets import fetch_california_housing\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.model_selection import train_test_split\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "sns.set_theme(style=\"whitegrid\")\n", "print(\"Libraries imported successfully.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Dataset Preparation\n", "\n", "We use the California Housing dataset. \n", "- **Train Set**: Used to train the base model.\n", "- **Calibration Set**: Used to learn the residuals or interval scaling.\n", "- **Test Set**: Used for final evaluation of coverage and width." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load data\n", "data = fetch_california_housing(as_frame=True)\n", "X, y = data.data, data.target\n", "\n", "# Split into Train (60%), Calibration (20%), and Test (20%)\n", "X_rest, X_test, y_rest, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "X_train, X_cal, y_train, y_cal = train_test_split(\n", " X_rest, y_rest, test_size=0.25, random_state=42\n", ")\n", "\n", "print(f\"Train size: {len(X_train)}\")\n", "print(f\"Calibration size: {len(X_cal)}\")\n", "print(f\"Test size: {len(X_test)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Training the Base Model\n", "\n", "We train a `PerpetualBooster` on the training set. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = PerpetualBooster(objective=\"SquaredLoss\", budget=1.0)\n", "model.fit(X_train, y_train)\n", "\n", "test_preds = model.predict(X_test)\n", "rmse = np.sqrt(mean_squared_error(y_test, test_preds))\n", "print(f\"Base Model RMSE: {rmse:.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Perpetual Calibration Methods\n", "\n", "Perpetual provides four algorithms to derive prediction intervals from the learned trees.\n", "\n", "### 4.1 Theory\n", "- **Conformal**: A traditional split-conformal approach where absolute residuals on the calibration set determine a quantile correction.\n", "- **MinMax**: Each leaf stores the min/max weight across folds. The interval is inferred from the sum of these ranges across trees.\n", "- **GRP (Global Relative Position)**: A more refined version of MinMax using the relative position of the target within weight spans.\n", "- **Weight Variance**: Estimates uncertainty based on the standard deviation of fold weights in reached leaves." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "methods = [\"Conformal\", \"MinMax\", \"GRP\", \"WeightVariance\"]\n", "alpha = 0.1 # Goal: 90% coverage\n", "results = []\n", "method_data = {} # To store intervals for viz\n", "\n", "for method in methods:\n", " print(f\"Running Perpetual {method}...\")\n", " m = PerpetualBooster(objective=\"SquaredLoss\", budget=0.5, save_node_stats=True)\n", " m.fit(X_train, y_train)\n", "\n", " if method == \"Conformal\":\n", " m.calibrate_conformal(X_train, y_train, X_cal, y_cal, alpha=[alpha])\n", " else:\n", " m.calibrate(X_cal, y_cal, alpha=[alpha], method=method)\n", "\n", " ints = m.predict_intervals(X_test)\n", " lower = ints[str(alpha)][:, 0]\n", " upper = ints[str(alpha)][:, 1]\n", "\n", " coverage = np.mean((y_test >= lower) & (y_test <= upper))\n", " width = np.mean(upper - lower)\n", " results.append(\n", " {\n", " \"Library\": \"Perpetual\",\n", " \"Method\": method,\n", " \"Coverage\": coverage,\n", " \"Avg Width\": width,\n", " }\n", " )\n", " method_data[f\"Perp_{method}\"] = (lower, upper)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. MAPIE Calibration Methods\n", "\n", "MAPIE implements variants of conformal prediction. \n", "\n", "### 5.1 Overview\n", "- **Split**: Residuals on a held-out set. Usually constant width.\n", "- **Jackknife / CV**: Resampling methods providing tighter bounds at the cost of training time.\n", "- **CQR (Conformalized Quantile Regression)**: Uses a quantile-based model to produce **adaptive widths**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.ensemble import GradientBoostingRegressor, HistGradientBoostingRegressor\n", "\n", "base_est = GradientBoostingRegressor(random_state=42)\n", "target_confidence = 1 - alpha\n", "\n", "mapie_configs = [\n", " (\"Split\", \"split\", None),\n", " (\"Jackknife+\", \"plus\", 20), # Use high-k CV instead of LOO for speed\n", " (\"Jackknife-minmax\", \"minmax\", 20), # on large datasets (12k samples)\n", " (\"CV+\", \"plus\", 5),\n", " (\"CV-minmax\", \"minmax\", 5),\n", "]\n", "\n", "for name, method, cv_strategy in mapie_configs:\n", " print(f\"Running MAPIE {name}...\")\n", " if method == \"split\":\n", " base_est.fit(X_train, y_train)\n", " # In 1.3.0, SplitConformalRegressor takes confidence_level and prefit=True\n", " mapie = SplitConformalRegressor(\n", " base_est, confidence_level=target_confidence, prefit=True\n", " )\n", " mapie.conformalize(X_cal, y_cal)\n", " else:\n", " # In 1.3.0, CrossConformalRegressor uses fit_conformalize\n", " mapie = CrossConformalRegressor(\n", " base_est, confidence_level=target_confidence, method=method, cv=cv_strategy\n", " )\n", " mapie.fit_conformalize(X_rest, y_rest)\n", "\n", " _, y_pis = mapie.predict_interval(X_test)\n", " # Shape is (n_samples, 2, n_confidence_levels)\n", " lower, upper = y_pis[:, 0, 0], y_pis[:, 1, 0]\n", "\n", " coverage = np.mean((y_test >= lower) & (y_test <= upper))\n", " width = np.mean(upper - lower)\n", " results.append(\n", " {\"Library\": \"MAPIE\", \"Method\": name, \"Coverage\": coverage, \"Avg Width\": width}\n", " )\n", " method_data[f\"MAPIE_{name}\"] = (lower, upper)\n", "\n", "print(\"Running MAPIE CQR...\")\n", "cqr_est = HistGradientBoostingRegressor(loss=\"quantile\", quantile=0.5, random_state=42)\n", "mapie_cqr = ConformalizedQuantileRegressor(cqr_est, confidence_level=target_confidence)\n", "mapie_cqr.fit(X_train, y_train).conformalize(X_cal, y_cal)\n", "_, y_pis = mapie_cqr.predict_interval(X_test)\n", "lower, upper = y_pis[:, 0, 0], y_pis[:, 1, 0]\n", "coverage = np.mean((y_test >= lower) & (y_test <= upper))\n", "width = np.mean(upper - lower)\n", "results.append(\n", " {\"Library\": \"MAPIE\", \"Method\": \"CQR\", \"Coverage\": coverage, \"Avg Width\": width}\n", ")\n", "method_data[\"MAPIE_CQR\"] = (lower, upper)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Comprehensive Result Comparison\n", "\n", "We compare all methods side-by-side. The key metrics are:\n", "1. **Coverage**: Should be very close to 0.90.\n", "2. **Avg Width**: Smaller is better (means tighter intervals with same confidence)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_res = pd.DataFrame(results)\n", "df_res[\"Coverage Error\"] = np.abs(df_res[\"Coverage\"] - (1 - alpha))\n", "df_res.sort_values(by=[\"Library\", \"Avg Width\"], ascending=True, inplace=True)\n", "display(df_res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Global Visualization\n", "\n", "Let's visualize the tradeoff between interval width and coverage." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(14, 7))\n", "sns.scatterplot(\n", " data=df_res, x=\"Avg Width\", y=\"Coverage\", hue=\"Library\", style=\"Method\", s=200\n", ")\n", "plt.axhline(1 - alpha, color=\"red\", linestyle=\"--\", label=\"Target Coverage (90%)\")\n", "plt.title(\"Calibration Benchmarking: Coverage vs. Efficiency (Width)\")\n", "plt.ylabel(\"Observed Coverage\")\n", "plt.xlabel(\"Average Interval Width (lower is better)\")\n", "plt.grid(True)\n", "plt.legend(bbox_to_anchor=(1.05, 1), loc=\"upper left\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Analyzing Interval Width Distributions (Adaptive vs. Fixed)\n", "\n", "One of the hallmarks of a good calibration method is its ability to produce **adaptive intervals**. \n", "- **Fixed-width methods**: produce the same uncertainty regardless of the input sample.\n", "- **Adaptive methods**: identify regions of high uncertainty and widen the bounds locally." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "width_data = []\n", "for name, (low, high) in method_data.items():\n", " widths = high - low\n", " for w in widths:\n", " width_data.append(\n", " {\n", " \"Method\": name,\n", " \"Width\": w,\n", " \"Library\": \"Perpetual\" if \"Perp\" in name else \"MAPIE\",\n", " }\n", " )\n", "\n", "df_widths = pd.DataFrame(width_data)\n", "\n", "plt.figure(figsize=(15, 8))\n", "sns.boxplot(data=df_widths, x=\"Method\", y=\"Width\", hue=\"Library\")\n", "plt.xticks(rotation=45)\n", "plt.title(\"Distribution of Prediction Interval Widths Across Methods\")\n", "plt.ylabel(\"Interval Width\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Comparison of Calibration Approaches\n", "\n", "Perpetual's calibration methods offer distinct advantages depending on the use case:\n", "\n", "### 1. Perpetual Conformal vs. MAPIE CQR\n", "- **Perpetual Conformal** is conceptually similar to **MAPIE CQR** (Conformalized Quantile Regression). Both aim to provide adaptive intervals that vary based on the predicted uncertainty.\n", "- Both methods typically require a **calibration set** to determine the final scaling or quantile adjustments necessary to guarantee the desired coverage level.\n", "\n", "### 2. Advantages of Perpetual's Internal Methods (MinMax, GRP, WeightVariance)\n", "- **No Model Re-training**: Unlike methods that might require training multiple models (like Jackknife/CV) or specific quantile regressors, Perpetual's internal methods (MinMax, GRP, WeightVariance) leverage statistics collected **during the initial training pass** (with `save_node_stats=True`).\n", "- **Efficiency**: Because the necessary information (fold weights/ranges) is already part of the trained model structure, calibrating these methods is extremely fast, involving only lightweight calculations on the calibration set.\n", "- **Separate Calibration Set**: It is important to note that **all** methods, including Perpetual's internal ones, still require a separate held-out **calibration set** to strictly validate coverage guarantees. While you save on training time/resources, you must reserve data for calibration to ensure valid prediction intervals." ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 4 }