Advanced Drift Detection in Perpetual

This tutorial demonstrates how to use Perpetual’s built-in drift detection and compares it against several state-of-the-art methods found in literature. We will explore how these metrics respond as we gradually increase the amount of drift in the California Housing dataset.

[ ]:

import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from perpetual import PerpetualBooster
from scipy import stats
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_squared_error, roc_auc_score
from sklearn.metrics.pairwise import rbf_kernel
from sklearn.model_selection import train_test_split

warnings.filterwarnings("ignore")

1. Defining Benchmark Metrics

We implement common drift detection methods for comparison.

[ ]:

def calculate_energy_distance(X_ref, X_curr, sample_size=500):
    if len(X_ref) > sample_size:
        X_ref = X_ref.sample(sample_size, random_state=42)
    if len(X_curr) > sample_size:
        X_curr = X_curr.sample(sample_size, random_state=42)

    # Using kernel-based estimate
    K_xx = rbf_kernel(X_ref, X_ref)
    K_yy = rbf_kernel(X_curr, X_curr)
    K_xy = rbf_kernel(X_ref, X_curr)

    return np.mean(K_xx) + np.mean(K_yy) - 2 * np.mean(K_xy)


def calculate_mahalanobis_distance(X_ref, X_curr):
    # Center and scale data
    mean_ref = X_ref.mean(axis=0)
    cov_ref = np.cov(X_ref.T)
    # Robust covariance estimate if needed, but for benchmark simple cov is fine
    inv_cov_ref = np.linalg.pinv(cov_ref)

    mean_curr = X_curr.mean(axis=0)
    diff = mean_curr - mean_ref
    return np.sqrt(diff.dot(inv_cov_ref).dot(diff))


def get_adversarial_auc(X_ref, X_curr):
    if len(X_ref) > 1000:
        X_ref = X_ref.sample(1000, random_state=42)
    if len(X_curr) > 1000:
        X_curr = X_curr.sample(1000, random_state=42)
    X_adv = pd.concat([X_ref, X_curr])
    y_adv = np.array([0] * len(X_ref) + [1] * len(X_curr))
    clf = RandomForestClassifier(n_estimators=50, max_depth=5, random_state=42)
    clf.fit(X_adv, y_adv)
    probs = clf.predict_proba(X_adv)[:, 1]
    return roc_auc_score(y_adv, probs)

2. Load Data and Train Model

We use the California Housing dataset and train a Perpetual model with save_node_stats=True.

[ ]:

data = fetch_california_housing()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = PerpetualBooster(objective="SquaredLoss", budget=1.0, save_node_stats=True)
model.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
train_residuals = y_train - y_pred_train

print(f"Baseline MSE: {mean_squared_error(y_test, model.predict(X_test)):.4f}")

3. Gradual Drift Simulation

We gradually increase the drift in MedInc and record how different metrics respond.

[ ]:

drift_levels = np.linspace(0, 1.2, 13)
results = []

for level in drift_levels:
    X_curr = X_test.copy()
    X_curr["MedInc"] += level  # Inject drift

    # Perpetual Unsupervised Metrics
    perp_data = model.calculate_drift(X_curr, drift_type="data")
    perp_concept = model.calculate_drift(X_curr, drift_type="concept")

    # Multivariate Benchmarks
    energy = calculate_energy_distance(X_test, X_curr)
    mahalanobis = calculate_mahalanobis_distance(X_test, X_curr)
    adv_auc = get_adversarial_auc(X_test, X_curr)

    # Performance Metrics
    y_curr_pred = model.predict(X_curr)
    mse = mean_squared_error(y_test, y_curr_pred)
    curr_residuals = y_test - y_curr_pred
    residual_ks = stats.ks_2samp(train_residuals, curr_residuals).statistic

    results.append(
        {
            "Level": level,
            "Perp Data": perp_data,
            "Perp Concept": perp_concept,
            "Energy Distance": energy,
            "Mahalanobis": mahalanobis,
            "Adversarial AUC": adv_auc,
            "Residual KS": residual_ks,
            "MSE": mse,
        }
    )

df_res = pd.DataFrame(results)

[ ]:

df_res.head(len(list(drift_levels)))

4. Visualizing Drift Response

[ ]:

fig, axes = plt.subplots(2, 1, figsize=(10, 14))
fig.subplots_adjust(hspace=0.3, right=0.75)

# --- Graph 1: Data Drift Metrics ---
ax1 = axes[0]
ax2 = ax1.twinx()
ax3 = ax1.twinx()
ax4 = ax1.twinx()

ax3.spines["right"].set_position(("outward", 60))
ax4.spines["right"].set_position(("outward", 120))

(p1,) = ax1.plot(
    df_res["Level"],
    df_res["Perp Data"],
    label="Perpetual Data Drift",
    marker="o",
    color="C0",
)
(p2,) = ax2.plot(
    df_res["Level"],
    df_res["Energy Distance"],
    label="Energy Distance",
    marker="s",
    color="C1",
)
(p3,) = ax3.plot(
    df_res["Level"],
    df_res["Mahalanobis"],
    label="Mahalanobis Distance",
    marker="^",
    color="C2",
)
(p4,) = ax4.plot(
    df_res["Level"],
    df_res["Adversarial AUC"],
    label="Adversarial AUC",
    marker="d",
    color="C3",
)

ax1.set_xlabel("Drift Level (MedInc Shift)", fontweight="bold")
ax1.set_ylabel("Perpetual Data Drift", color="C0")
ax2.set_ylabel("Energy Distance", color="C1")
ax3.set_ylabel("Mahalanobis Distance", color="C2")
ax4.set_ylabel("Adversarial AUC", color="C3")

ax1.tick_params(axis="y", colors="C0")
ax2.tick_params(axis="y", colors="C1")
ax3.tick_params(axis="y", colors="C2")
ax4.tick_params(axis="y", colors="C3")

lines1 = [p1, p2, p3, p4]
ax1.legend(lines1, [line.get_label() for line in lines1], loc="upper left")
ax1.set_title("Data Drift Benchmark", fontsize=14, fontweight="bold")
ax1.grid(True, alpha=0.3)

# --- Graph 2: Concept Drift & Performance Metrics ---
ax5 = axes[1]
ax6 = ax5.twinx()
ax7 = ax5.twinx()

ax7.spines["right"].set_position(("outward", 60))

(p5,) = ax5.plot(
    df_res["Level"],
    df_res["Perp Concept"],
    label="Perpetual Concept Drift",
    marker="o",
    color="C0",
)
(p6,) = ax6.plot(
    df_res["Level"],
    df_res["Residual KS"],
    label="Residual KS Stat",
    marker="s",
    color="C1",
)
(p7,) = ax7.plot(df_res["Level"], df_res["MSE"], label="MSE", marker="^", color="C2")

ax5.set_xlabel("Drift Level (MedInc Shift)", fontweight="bold")
ax5.set_ylabel("Perpetual Concept Drift", color="C0")
ax6.set_ylabel("Residual KS Stat", color="C1")
ax7.set_ylabel("MSE", color="C2")

ax5.tick_params(axis="y", colors="C0")
ax6.tick_params(axis="y", colors="C1")
ax7.tick_params(axis="y", colors="C2")

lines2 = [p5, p6, p7]
ax5.legend(lines2, [line.get_label() for line in lines2], loc="upper left")
ax5.set_title("Concept Drift Benchmark", fontsize=14, fontweight="bold")
ax5.grid(True, alpha=0.3)

plt.show()

4. Practical Threshold Guidance

Based on empirical observations, here are the recommended threshold ranges for interpreting these drift scores.

Perpetual Unsupervised Scores (Chi-Squared Statistic)

The drift score is an average Chi-squared statistic across model nodes.

< 1.0: No Drift. The current data matches the reference distribution well.
1.0 - 2.0: Warning. Slight distributional shift. Monitor model performance closely.
> 2.5: Significant Drift. High likelihood of data/concept shift. Retraining or investigative action (e.g., data quality check) is recommended.

Benchmark Ranges

Energy Distance (MMD): Scale-dependent, but values significantly above the 0.0 baseline indicate drift. MMD is a kernel-based metric for distribution distance.
Mahalanobis Distance: Measures standard deviations from the mean; values > 3.0 (3 sigma) are typically considered significant outliers/drift.
Adversarial AUC (ROC-AUC): Ranges from 0.5 (perfectly indistinguishable) to 1.0 (perfectly separable).
- 0.50 - 0.60: Low/No drift.
- 0.60 - 0.75: Moderate drift.
- > 0.80: High drift.

Performance Diagnostics (Concept Drift)

Residual KS Stat: Measures how much the distribution of prediction errors has changed. A high KS value (> 0.2 in this case) confirms that concept drift is affecting model accuracy.
MSE: A dramatic increase in Mean Squared Error indicates the model is struggling to predict the target under the new shifted distribution.

5. Conclusion

In this tutorial, we compared Perpetual’s unsupervised drift metrics against state-of-the-art multivariate benchmarks:

Perpetual Data & Concept Drift: These metrics provide a model-based view of how data shifts affect both the marginal feature distributions and the predictive structure, showing a clear response to gradual drift.
Energy Distance: A robust kernel-based multivariate metric that captures distribution shifts effectively.
Mahalanobis Distance: Useful for detecting changes in the multivariate mean relative to the training feature covariance.
Adversarial AUC: A powerful classifier-based approach that identifies how distinguishable the new data is from the reference data.

We can see that Perpetual’s internal metrics correlate strongly with these established multivariate methods, offering a fast, performant, and unsupervised built-in alternative for drift monitoring during model inference.