{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Mastering Regression Calibration: From Theoretical Basics to Advanced Methods\n",
    "\n",
    "## 1. Introduction to Calibration\n",
    "\n",
    "In regression, we often care about more than just a single point estimate (the mean). We want to know the **uncertainty** associated with that prediction. **Calibration** is the process of ensuring that our uncertainty estimates (like prediction intervals) are \"honest.\"\n",
    "\n",
    "### What is a Prediction Interval?\n",
    "A prediction interval for a given target $y$ at a confidence level $(1 - \\alpha)$ is an interval $[L, U]$ such that:\n",
    "$$P(y \\in [L, U]) \\geq 1 - \\alpha$$\n",
    "\n",
    "If we say we have a 90% prediction interval ($\\alpha = 0.1$), we expect that in 90% of future cases, the true value will fall within those bounds.\n",
    "\n",
    "### The Need for Calibration\n",
    "Most machine learning models (Boosted Trees, Neural Networks, etc.) are optimized to minimize a loss function (like MSE) but do not inherently produce calibrated probability distributions or intervals. Calibration methods take a pre-trained model and \"fine-tune\" its uncertainty estimates using a held-out **Calibration Set**.\n",
    "\n",
    "In this tutorial, we will explore two families of calibration methods:\n",
    "1.  **Perpetual-native methods**: Leverage internal statistics from the `PerpetualBooster` (leaf fold weights).\n",
    "2.  **MAPIE methods**: Model-agnostic conformal prediction methods that work with any regressor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from mapie.regression import (\n",
    "    ConformalizedQuantileRegressor,\n",
    "    CrossConformalRegressor,\n",
    "    SplitConformalRegressor,\n",
    ")\n",
    "from perpetual import PerpetualBooster\n",
    "from sklearn.datasets import fetch_california_housing\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "sns.set_theme(style=\"whitegrid\")\n",
    "print(\"Libraries imported successfully.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Dataset Preparation\n",
    "\n",
    "We use the California Housing dataset. \n",
    "- **Train Set**: Used to train the base model.\n",
    "- **Calibration Set**: Used to learn the residuals or interval scaling.\n",
    "- **Test Set**: Used for final evaluation of coverage and width."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load data\n",
    "data = fetch_california_housing(as_frame=True)\n",
    "X, y = data.data, data.target\n",
    "\n",
    "# Split into Train (60%), Calibration (20%), and Test (20%)\n",
    "X_rest, X_test, y_rest, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n",
    "X_train, X_cal, y_train, y_cal = train_test_split(\n",
    "    X_rest, y_rest, test_size=0.25, random_state=42\n",
    ")\n",
    "\n",
    "print(f\"Train size:       {len(X_train)}\")\n",
    "print(f\"Calibration size: {len(X_cal)}\")\n",
    "print(f\"Test size:        {len(X_test)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Training the Base Model\n",
    "\n",
    "We train a `PerpetualBooster` on the training set. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = PerpetualBooster(objective=\"SquaredLoss\", budget=1.0)\n",
    "model.fit(X_train, y_train)\n",
    "\n",
    "test_preds = model.predict(X_test)\n",
    "rmse = np.sqrt(mean_squared_error(y_test, test_preds))\n",
    "print(f\"Base Model RMSE: {rmse:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Perpetual Calibration Methods\n",
    "\n",
    "Perpetual provides four algorithms to derive prediction intervals from the learned trees.\n",
    "\n",
    "### 4.1 Theory\n",
    "- **Conformal**: A traditional split-conformal approach where absolute residuals on the calibration set determine a quantile correction.\n",
    "- **MinMax**: Each leaf stores the min/max weight across folds. The interval is inferred from the sum of these ranges across trees.\n",
    "- **GRP (Global Relative Position)**: A more refined version of MinMax using the relative position of the target within weight spans.\n",
    "- **Weight Variance**: Estimates uncertainty based on the standard deviation of fold weights in reached leaves."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "methods = [\"Conformal\", \"MinMax\", \"GRP\", \"WeightVariance\"]\n",
    "alpha = 0.1  # Goal: 90% coverage\n",
    "results = []\n",
    "method_data = {}  # To store intervals for viz\n",
    "\n",
    "for method in methods:\n",
    "    print(f\"Running Perpetual {method}...\")\n",
    "    m = PerpetualBooster(objective=\"SquaredLoss\", budget=0.5, save_node_stats=True)\n",
    "    m.fit(X_train, y_train)\n",
    "\n",
    "    if method == \"Conformal\":\n",
    "        m.calibrate_conformal(X_train, y_train, X_cal, y_cal, alpha=[alpha])\n",
    "    else:\n",
    "        m.calibrate(X_cal, y_cal, alpha=[alpha], method=method)\n",
    "\n",
    "    ints = m.predict_intervals(X_test)\n",
    "    lower = ints[str(alpha)][:, 0]\n",
    "    upper = ints[str(alpha)][:, 1]\n",
    "\n",
    "    coverage = np.mean((y_test >= lower) & (y_test <= upper))\n",
    "    width = np.mean(upper - lower)\n",
    "    results.append(\n",
    "        {\n",
    "            \"Library\": \"Perpetual\",\n",
    "            \"Method\": method,\n",
    "            \"Coverage\": coverage,\n",
    "            \"Avg Width\": width,\n",
    "        }\n",
    "    )\n",
    "    method_data[f\"Perp_{method}\"] = (lower, upper)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. MAPIE Calibration Methods\n",
    "\n",
    "MAPIE implements variants of conformal prediction. \n",
    "\n",
    "### 5.1 Overview\n",
    "- **Split**: Residuals on a held-out set. Usually constant width.\n",
    "- **Jackknife / CV**: Resampling methods providing tighter bounds at the cost of training time.\n",
    "- **CQR (Conformalized Quantile Regression)**: Uses a quantile-based model to produce **adaptive widths**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import GradientBoostingRegressor, HistGradientBoostingRegressor\n",
    "\n",
    "base_est = GradientBoostingRegressor(random_state=42)\n",
    "target_confidence = 1 - alpha\n",
    "\n",
    "mapie_configs = [\n",
    "    (\"Split\", \"split\", None),\n",
    "    (\"Jackknife+\", \"plus\", 20),  # Use high-k CV instead of LOO for speed\n",
    "    (\"Jackknife-minmax\", \"minmax\", 20),  # on large datasets (12k samples)\n",
    "    (\"CV+\", \"plus\", 5),\n",
    "    (\"CV-minmax\", \"minmax\", 5),\n",
    "]\n",
    "\n",
    "for name, method, cv_strategy in mapie_configs:\n",
    "    print(f\"Running MAPIE {name}...\")\n",
    "    if method == \"split\":\n",
    "        base_est.fit(X_train, y_train)\n",
    "        # In 1.3.0, SplitConformalRegressor takes confidence_level and prefit=True\n",
    "        mapie = SplitConformalRegressor(\n",
    "            base_est, confidence_level=target_confidence, prefit=True\n",
    "        )\n",
    "        mapie.conformalize(X_cal, y_cal)\n",
    "    else:\n",
    "        # In 1.3.0, CrossConformalRegressor uses fit_conformalize\n",
    "        mapie = CrossConformalRegressor(\n",
    "            base_est, confidence_level=target_confidence, method=method, cv=cv_strategy\n",
    "        )\n",
    "        mapie.fit_conformalize(X_rest, y_rest)\n",
    "\n",
    "    _, y_pis = mapie.predict_interval(X_test)\n",
    "    # Shape is (n_samples, 2, n_confidence_levels)\n",
    "    lower, upper = y_pis[:, 0, 0], y_pis[:, 1, 0]\n",
    "\n",
    "    coverage = np.mean((y_test >= lower) & (y_test <= upper))\n",
    "    width = np.mean(upper - lower)\n",
    "    results.append(\n",
    "        {\"Library\": \"MAPIE\", \"Method\": name, \"Coverage\": coverage, \"Avg Width\": width}\n",
    "    )\n",
    "    method_data[f\"MAPIE_{name}\"] = (lower, upper)\n",
    "\n",
    "print(\"Running MAPIE CQR...\")\n",
    "cqr_est = HistGradientBoostingRegressor(loss=\"quantile\", quantile=0.5, random_state=42)\n",
    "mapie_cqr = ConformalizedQuantileRegressor(cqr_est, confidence_level=target_confidence)\n",
    "mapie_cqr.fit(X_train, y_train).conformalize(X_cal, y_cal)\n",
    "_, y_pis = mapie_cqr.predict_interval(X_test)\n",
    "lower, upper = y_pis[:, 0, 0], y_pis[:, 1, 0]\n",
    "coverage = np.mean((y_test >= lower) & (y_test <= upper))\n",
    "width = np.mean(upper - lower)\n",
    "results.append(\n",
    "    {\"Library\": \"MAPIE\", \"Method\": \"CQR\", \"Coverage\": coverage, \"Avg Width\": width}\n",
    ")\n",
    "method_data[\"MAPIE_CQR\"] = (lower, upper)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Comprehensive Result Comparison\n",
    "\n",
    "We compare all methods side-by-side. The key metrics are:\n",
    "1. **Coverage**: Should be very close to 0.90.\n",
    "2. **Avg Width**: Smaller is better (means tighter intervals with same confidence)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_res = pd.DataFrame(results)\n",
    "df_res[\"Coverage Error\"] = np.abs(df_res[\"Coverage\"] - (1 - alpha))\n",
    "df_res.sort_values(by=[\"Library\", \"Avg Width\"], ascending=True, inplace=True)\n",
    "display(df_res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Global Visualization\n",
    "\n",
    "Let's visualize the tradeoff between interval width and coverage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(14, 7))\n",
    "sns.scatterplot(\n",
    "    data=df_res, x=\"Avg Width\", y=\"Coverage\", hue=\"Library\", style=\"Method\", s=200\n",
    ")\n",
    "plt.axhline(1 - alpha, color=\"red\", linestyle=\"--\", label=\"Target Coverage (90%)\")\n",
    "plt.title(\"Calibration Benchmarking: Coverage vs. Efficiency (Width)\")\n",
    "plt.ylabel(\"Observed Coverage\")\n",
    "plt.xlabel(\"Average Interval Width (lower is better)\")\n",
    "plt.grid(True)\n",
    "plt.legend(bbox_to_anchor=(1.05, 1), loc=\"upper left\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Analyzing Interval Width Distributions (Adaptive vs. Fixed)\n",
    "\n",
    "One of the hallmarks of a good calibration method is its ability to produce **adaptive intervals**. \n",
    "- **Fixed-width methods**: produce the same uncertainty regardless of the input sample.\n",
    "- **Adaptive methods**: identify regions of high uncertainty and widen the bounds locally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "width_data = []\n",
    "for name, (low, high) in method_data.items():\n",
    "    widths = high - low\n",
    "    for w in widths:\n",
    "        width_data.append(\n",
    "            {\n",
    "                \"Method\": name,\n",
    "                \"Width\": w,\n",
    "                \"Library\": \"Perpetual\" if \"Perp\" in name else \"MAPIE\",\n",
    "            }\n",
    "        )\n",
    "\n",
    "df_widths = pd.DataFrame(width_data)\n",
    "\n",
    "plt.figure(figsize=(15, 8))\n",
    "sns.boxplot(data=df_widths, x=\"Method\", y=\"Width\", hue=\"Library\")\n",
    "plt.xticks(rotation=45)\n",
    "plt.title(\"Distribution of Prediction Interval Widths Across Methods\")\n",
    "plt.ylabel(\"Interval Width\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Comparison of Calibration Approaches\n",
    "\n",
    "Perpetual's calibration methods offer distinct advantages depending on the use case:\n",
    "\n",
    "### 1. Perpetual Conformal vs. MAPIE CQR\n",
    "- **Perpetual Conformal** is conceptually similar to **MAPIE CQR** (Conformalized Quantile Regression). Both aim to provide adaptive intervals that vary based on the predicted uncertainty.\n",
    "- Both methods typically require a **calibration set** to determine the final scaling or quantile adjustments necessary to guarantee the desired coverage level.\n",
    "\n",
    "### 2. Advantages of Perpetual's Internal Methods (MinMax, GRP, WeightVariance)\n",
    "- **No Model Re-training**: Unlike methods that might require training multiple models (like Jackknife/CV) or specific quantile regressors, Perpetual's internal methods (MinMax, GRP, WeightVariance) leverage statistics collected **during the initial training pass** (with `save_node_stats=True`).\n",
    "- **Efficiency**: Because the necessary information (fold weights/ranges) is already part of the trained model structure, calibrating these methods is extremely fast, involving only lightweight calculations on the calibration set.\n",
    "- **Separate Calibration Set**: It is important to note that **all** methods, including Perpetual's internal ones, still require a separate held-out **calibration set** to strictly validate coverage guarantees. While you save on training time/resources, you must reserve data for calibration to ensure valid prediction intervals."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}