{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PerpetualBooster Objectives Tutorial\n", "\n", "This tutorial demonstrates how to use various objective functions with `PerpetualBooster`.\n", "Some objectives require specific hyperparameters to be passed as keyword arguments. The objective parameter allows us to fine-tune the loss function without changing the core boosting algorithm.\n", "\n", "We will cover the following objectives:\n", "- `SquaredLoss` (default)\n", "- `QuantileLoss` (requires `quantile`)\n", "- `AdaptiveHuberLoss` (requires `quantile`)\n", "- `HuberLoss` (requires `delta`)\n", "- `FairLoss` (requires `c`)\n", "- `TweedieLoss` (requires `p`)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from perpetual import PerpetualBooster\n", "from sklearn.datasets import make_regression\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Generate synthetic dataset with outliers\n", "X, y = make_regression(n_samples=500, n_features=1, noise=15.0, random_state=42)\n", "# Add some extreme outliers\n", "y[::10] += 200 * (np.random.rand(50) - 0.5)\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.2, random_state=42\n", ")\n", "\n", "plt.scatter(X_train, y_train, alpha=0.5, label=\"Training data\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## QuantileLoss\n", "\n", "`QuantileLoss` requires the `quantile` keyword argument. It is used to estimate a specific quantile of the conditional distribution. For example, `quantile=0.5` estimates the median, which is more robust to outliers than the mean (`SquaredLoss`). `quantile=0.9` estimates the 90th percentile." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_q50 = PerpetualBooster(objective=\"QuantileLoss\", quantile=0.5)\n", "model_q50.fit(X_train, y_train)\n", "preds_q50 = model_q50.predict(X_test)\n", "\n", "model_q90 = PerpetualBooster(objective=\"QuantileLoss\", quantile=0.9)\n", "model_q90.fit(X_train, y_train)\n", "preds_q90 = model_q90.predict(X_test)\n", "\n", "# Plotting the predictions\n", "sort_idx = np.argsort(X_test[:, 0])\n", "plt.scatter(X_test, y_test, alpha=0.5, color=\"gray\", label=\"Test data\")\n", "plt.plot(X_test[sort_idx], preds_q50[sort_idx], color=\"red\", label=\"Median (q=0.5)\")\n", "plt.plot(X_test[sort_idx], preds_q90[sort_idx], color=\"blue\", label=\"90th Percentile\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## HuberLoss and AdaptiveHuberLoss\n", "\n", "`HuberLoss` requires the `delta` keyword argument. It acts like `SquaredLoss` for small errors and `AbsoluteLoss` for large errors, making it robust to outliers. \n", "\n", "`AdaptiveHuberLoss` automatically adapts its threshold based on the data, and accepts a `quantile` parameter to dictate the outlier cutoff." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_huber = PerpetualBooster(objective=\"HuberLoss\", delta=1.5)\n", "model_huber.fit(X_train, y_train)\n", "\n", "model_adaptive = PerpetualBooster(objective=\"AdaptiveHuberLoss\", quantile=0.8)\n", "model_adaptive.fit(X_train, y_train)\n", "\n", "preds_huber = model_huber.predict(X_test)\n", "preds_adaptive = model_adaptive.predict(X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## FairLoss and TweedieLoss\n", "\n", "`FairLoss` (requires `c` parameter) is another robust loss function.\n", "\n", "`TweedieLoss` (requires `p` parameter) is used for highly right-skewed data, often involving exact zeros (like insurance claims)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_fair = PerpetualBooster(objective=\"FairLoss\", c=2.0)\n", "model_fair.fit(X_train, y_train)\n", "\n", "# Tweedie requires strictly positive targets for p in (1, 2)\n", "y_train_pos = y_train - y_train.min() + 0.1\n", "model_tweedie = PerpetualBooster(objective=\"TweedieLoss\", p=1.5)\n", "model_tweedie.fit(X_train, y_train_pos)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 4 }