{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PerpetualBooster Objectives Tutorial\n",
    "\n",
    "This tutorial demonstrates how to use various objective functions with `PerpetualBooster`.\n",
    "Some objectives require specific hyperparameters to be passed as keyword arguments. The objective parameter allows us to fine-tune the loss function without changing the core boosting algorithm.\n",
    "\n",
    "We will cover the following objectives:\n",
    "- `SquaredLoss` (default)\n",
    "- `QuantileLoss` (requires `quantile`)\n",
    "- `AdaptiveHuberLoss` (requires `quantile`)\n",
    "- `HuberLoss` (requires `delta`)\n",
    "- `FairLoss` (requires `c`)\n",
    "- `TweedieLoss` (requires `p`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from perpetual import PerpetualBooster\n",
    "from sklearn.datasets import make_regression\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "# Generate synthetic dataset with outliers\n",
    "X, y = make_regression(n_samples=500, n_features=1, noise=15.0, random_state=42)\n",
    "# Add some extreme outliers\n",
    "y[::10] += 200 * (np.random.rand(50) - 0.5)\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, test_size=0.2, random_state=42\n",
    ")\n",
    "\n",
    "plt.scatter(X_train, y_train, alpha=0.5, label=\"Training data\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QuantileLoss\n",
    "\n",
    "`QuantileLoss` requires the `quantile` keyword argument. It is used to estimate a specific quantile of the conditional distribution. For example, `quantile=0.5` estimates the median, which is more robust to outliers than the mean (`SquaredLoss`). `quantile=0.9` estimates the 90th percentile."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_q50 = PerpetualBooster(objective=\"QuantileLoss\", quantile=0.5)\n",
    "model_q50.fit(X_train, y_train)\n",
    "preds_q50 = model_q50.predict(X_test)\n",
    "\n",
    "model_q90 = PerpetualBooster(objective=\"QuantileLoss\", quantile=0.9)\n",
    "model_q90.fit(X_train, y_train)\n",
    "preds_q90 = model_q90.predict(X_test)\n",
    "\n",
    "# Plotting the predictions\n",
    "sort_idx = np.argsort(X_test[:, 0])\n",
    "plt.scatter(X_test, y_test, alpha=0.5, color=\"gray\", label=\"Test data\")\n",
    "plt.plot(X_test[sort_idx], preds_q50[sort_idx], color=\"red\", label=\"Median (q=0.5)\")\n",
    "plt.plot(X_test[sort_idx], preds_q90[sort_idx], color=\"blue\", label=\"90th Percentile\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HuberLoss and AdaptiveHuberLoss\n",
    "\n",
    "`HuberLoss` requires the `delta` keyword argument. It acts like `SquaredLoss` for small errors and `AbsoluteLoss` for large errors, making it robust to outliers. \n",
    "\n",
    "`AdaptiveHuberLoss` automatically adapts its threshold based on the data, and accepts a `quantile` parameter to dictate the outlier cutoff."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_huber = PerpetualBooster(objective=\"HuberLoss\", delta=1.5)\n",
    "model_huber.fit(X_train, y_train)\n",
    "\n",
    "model_adaptive = PerpetualBooster(objective=\"AdaptiveHuberLoss\", quantile=0.8)\n",
    "model_adaptive.fit(X_train, y_train)\n",
    "\n",
    "preds_huber = model_huber.predict(X_test)\n",
    "preds_adaptive = model_adaptive.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## FairLoss and TweedieLoss\n",
    "\n",
    "`FairLoss` (requires `c` parameter) is another robust loss function.\n",
    "\n",
    "`TweedieLoss` (requires `p` parameter) is used for highly right-skewed data, often involving exact zeros (like insurance claims)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_fair = PerpetualBooster(objective=\"FairLoss\", c=2.0)\n",
    "model_fair.fit(X_train, y_train)\n",
    "\n",
    "# Tweedie requires strictly positive targets for p in (1, 2)\n",
    "y_train_pos = y_train - y_train.min() + 0.1\n",
    "model_tweedie = PerpetualBooster(objective=\"TweedieLoss\", p=1.5)\n",
    "model_tweedie.fit(X_train, y_train_pos)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}