PerpetualBooster Objectives Tutorial

This tutorial demonstrates how to use various objective functions with PerpetualBooster. Some objectives require specific hyperparameters to be passed as keyword arguments. The objective parameter allows us to fine-tune the loss function without changing the core boosting algorithm.

We will cover the following objectives:

SquaredLoss (default)
QuantileLoss (requires quantile)
AdaptiveHuberLoss (requires quantile)
HuberLoss (requires delta)
FairLoss (requires c)
TweedieLoss (requires p)

[ ]:

import matplotlib.pyplot as plt
import numpy as np
from perpetual import PerpetualBooster
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate synthetic dataset with outliers
X, y = make_regression(n_samples=500, n_features=1, noise=15.0, random_state=42)
# Add some extreme outliers
y[::10] += 200 * (np.random.rand(50) - 0.5)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

plt.scatter(X_train, y_train, alpha=0.5, label="Training data")
plt.legend()
plt.show()

QuantileLoss

QuantileLoss requires the quantile keyword argument. It is used to estimate a specific quantile of the conditional distribution. For example, quantile=0.5 estimates the median, which is more robust to outliers than the mean (SquaredLoss). quantile=0.9 estimates the 90th percentile.

[ ]:

model_q50 = PerpetualBooster(objective="QuantileLoss", quantile=0.5)
model_q50.fit(X_train, y_train)
preds_q50 = model_q50.predict(X_test)

model_q90 = PerpetualBooster(objective="QuantileLoss", quantile=0.9)
model_q90.fit(X_train, y_train)
preds_q90 = model_q90.predict(X_test)

# Plotting the predictions
sort_idx = np.argsort(X_test[:, 0])
plt.scatter(X_test, y_test, alpha=0.5, color="gray", label="Test data")
plt.plot(X_test[sort_idx], preds_q50[sort_idx], color="red", label="Median (q=0.5)")
plt.plot(X_test[sort_idx], preds_q90[sort_idx], color="blue", label="90th Percentile")
plt.legend()
plt.show()

HuberLoss and AdaptiveHuberLoss

HuberLoss requires the delta keyword argument. It acts like SquaredLoss for small errors and AbsoluteLoss for large errors, making it robust to outliers.

AdaptiveHuberLoss automatically adapts its threshold based on the data, and accepts a quantile parameter to dictate the outlier cutoff.

[ ]:

model_huber = PerpetualBooster(objective="HuberLoss", delta=1.5)
model_huber.fit(X_train, y_train)

model_adaptive = PerpetualBooster(objective="AdaptiveHuberLoss", quantile=0.8)
model_adaptive.fit(X_train, y_train)

preds_huber = model_huber.predict(X_test)
preds_adaptive = model_adaptive.predict(X_test)

FairLoss and TweedieLoss

FairLoss (requires c parameter) is another robust loss function.

TweedieLoss (requires p parameter) is used for highly right-skewed data, often involving exact zeros (like insurance claims).

[ ]:

model_fair = PerpetualBooster(objective="FairLoss", c=2.0)
model_fair.fit(X_train, y_train)

# Tweedie requires strictly positive targets for p in (1, 2)
y_train_pos = y_train - y_train.min() + 0.1
model_tweedie = PerpetualBooster(objective="TweedieLoss", p=1.5)
model_tweedie.fit(X_train, y_train_pos)