Advanced Drift Detection in Perpetual
This tutorial demonstrates how to use Perpetual’s built-in drift detection and compares it against several state-of-the-art methods found in literature. We will explore how these metrics respond as we gradually increase the amount of drift in the California Housing dataset.
[ ]:
import warnings
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from perpetual import PerpetualBooster
from scipy import stats
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_squared_error, roc_auc_score
from sklearn.metrics.pairwise import rbf_kernel
from sklearn.model_selection import train_test_split
warnings.filterwarnings("ignore")
1. Defining Benchmark Metrics
We implement common drift detection methods for comparison.
[ ]:
def calculate_energy_distance(X_ref, X_curr, sample_size=500):
if len(X_ref) > sample_size:
X_ref = X_ref.sample(sample_size, random_state=42)
if len(X_curr) > sample_size:
X_curr = X_curr.sample(sample_size, random_state=42)
# Using kernel-based estimate
K_xx = rbf_kernel(X_ref, X_ref)
K_yy = rbf_kernel(X_curr, X_curr)
K_xy = rbf_kernel(X_ref, X_curr)
return np.mean(K_xx) + np.mean(K_yy) - 2 * np.mean(K_xy)
def calculate_mahalanobis_distance(X_ref, X_curr):
# Center and scale data
mean_ref = X_ref.mean(axis=0)
cov_ref = np.cov(X_ref.T)
# Robust covariance estimate if needed, but for benchmark simple cov is fine
inv_cov_ref = np.linalg.pinv(cov_ref)
mean_curr = X_curr.mean(axis=0)
diff = mean_curr - mean_ref
return np.sqrt(diff.dot(inv_cov_ref).dot(diff))
def get_adversarial_auc(X_ref, X_curr):
if len(X_ref) > 1000:
X_ref = X_ref.sample(1000, random_state=42)
if len(X_curr) > 1000:
X_curr = X_curr.sample(1000, random_state=42)
X_adv = pd.concat([X_ref, X_curr])
y_adv = np.array([0] * len(X_ref) + [1] * len(X_curr))
clf = RandomForestClassifier(n_estimators=50, max_depth=5, random_state=42)
clf.fit(X_adv, y_adv)
probs = clf.predict_proba(X_adv)[:, 1]
return roc_auc_score(y_adv, probs)
2. Load Data and Train Model
We use the California Housing dataset and train a Perpetual model with save_node_stats=True.
[ ]:
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = PerpetualBooster(objective="SquaredLoss", budget=1.0, save_node_stats=True)
model.fit(X_train, y_train)
y_pred_train = model.predict(X_train)
train_residuals = y_train - y_pred_train
print(f"Baseline MSE: {mean_squared_error(y_test, model.predict(X_test)):.4f}")
3. Gradual Drift Simulation
We gradually increase the drift in MedInc and record how different metrics respond.
[ ]:
drift_levels = np.linspace(0, 1.2, 13)
results = []
for level in drift_levels:
X_curr = X_test.copy()
X_curr["MedInc"] += level # Inject drift
# Perpetual Unsupervised Metrics
perp_data = model.calculate_drift(X_curr, drift_type="data")
perp_concept = model.calculate_drift(X_curr, drift_type="concept")
# Multivariate Benchmarks
energy = calculate_energy_distance(X_test, X_curr)
mahalanobis = calculate_mahalanobis_distance(X_test, X_curr)
adv_auc = get_adversarial_auc(X_test, X_curr)
# Performance Metrics
y_curr_pred = model.predict(X_curr)
mse = mean_squared_error(y_test, y_curr_pred)
curr_residuals = y_test - y_curr_pred
residual_ks = stats.ks_2samp(train_residuals, curr_residuals).statistic
results.append(
{
"Level": level,
"Perp Data": perp_data,
"Perp Concept": perp_concept,
"Energy Distance": energy,
"Mahalanobis": mahalanobis,
"Adversarial AUC": adv_auc,
"Residual KS": residual_ks,
"MSE": mse,
}
)
df_res = pd.DataFrame(results)
[ ]:
df_res.head(len(list(drift_levels)))
4. Visualizing Drift Response
[ ]:
fig, axes = plt.subplots(2, 1, figsize=(10, 14))
fig.subplots_adjust(hspace=0.3, right=0.75)
# --- Graph 1: Data Drift Metrics ---
ax1 = axes[0]
ax2 = ax1.twinx()
ax3 = ax1.twinx()
ax4 = ax1.twinx()
ax3.spines["right"].set_position(("outward", 60))
ax4.spines["right"].set_position(("outward", 120))
(p1,) = ax1.plot(
df_res["Level"],
df_res["Perp Data"],
label="Perpetual Data Drift",
marker="o",
color="C0",
)
(p2,) = ax2.plot(
df_res["Level"],
df_res["Energy Distance"],
label="Energy Distance",
marker="s",
color="C1",
)
(p3,) = ax3.plot(
df_res["Level"],
df_res["Mahalanobis"],
label="Mahalanobis Distance",
marker="^",
color="C2",
)
(p4,) = ax4.plot(
df_res["Level"],
df_res["Adversarial AUC"],
label="Adversarial AUC",
marker="d",
color="C3",
)
ax1.set_xlabel("Drift Level (MedInc Shift)", fontweight="bold")
ax1.set_ylabel("Perpetual Data Drift", color="C0")
ax2.set_ylabel("Energy Distance", color="C1")
ax3.set_ylabel("Mahalanobis Distance", color="C2")
ax4.set_ylabel("Adversarial AUC", color="C3")
ax1.tick_params(axis="y", colors="C0")
ax2.tick_params(axis="y", colors="C1")
ax3.tick_params(axis="y", colors="C2")
ax4.tick_params(axis="y", colors="C3")
lines1 = [p1, p2, p3, p4]
ax1.legend(lines1, [line.get_label() for line in lines1], loc="upper left")
ax1.set_title("Data Drift Benchmark", fontsize=14, fontweight="bold")
ax1.grid(True, alpha=0.3)
# --- Graph 2: Concept Drift & Performance Metrics ---
ax5 = axes[1]
ax6 = ax5.twinx()
ax7 = ax5.twinx()
ax7.spines["right"].set_position(("outward", 60))
(p5,) = ax5.plot(
df_res["Level"],
df_res["Perp Concept"],
label="Perpetual Concept Drift",
marker="o",
color="C0",
)
(p6,) = ax6.plot(
df_res["Level"],
df_res["Residual KS"],
label="Residual KS Stat",
marker="s",
color="C1",
)
(p7,) = ax7.plot(df_res["Level"], df_res["MSE"], label="MSE", marker="^", color="C2")
ax5.set_xlabel("Drift Level (MedInc Shift)", fontweight="bold")
ax5.set_ylabel("Perpetual Concept Drift", color="C0")
ax6.set_ylabel("Residual KS Stat", color="C1")
ax7.set_ylabel("MSE", color="C2")
ax5.tick_params(axis="y", colors="C0")
ax6.tick_params(axis="y", colors="C1")
ax7.tick_params(axis="y", colors="C2")
lines2 = [p5, p6, p7]
ax5.legend(lines2, [line.get_label() for line in lines2], loc="upper left")
ax5.set_title("Concept Drift Benchmark", fontsize=14, fontweight="bold")
ax5.grid(True, alpha=0.3)
plt.show()
4. Practical Threshold Guidance
Based on empirical observations, here are the recommended threshold ranges for interpreting these drift scores.
Perpetual Unsupervised Scores (Chi-Squared Statistic)
The drift score is an average Chi-squared statistic across model nodes.
< 1.0: No Drift. The current data matches the reference distribution well.
1.0 - 2.0: Warning. Slight distributional shift. Monitor model performance closely.
> 2.5: Significant Drift. High likelihood of data/concept shift. Retraining or investigative action (e.g., data quality check) is recommended.
Benchmark Ranges
Energy Distance (MMD): Scale-dependent, but values significantly above the 0.0 baseline indicate drift. MMD is a kernel-based metric for distribution distance.
Mahalanobis Distance: Measures standard deviations from the mean; values > 3.0 (3 sigma) are typically considered significant outliers/drift.
Adversarial AUC (ROC-AUC): Ranges from 0.5 (perfectly indistinguishable) to 1.0 (perfectly separable).
0.50 - 0.60: Low/No drift.
0.60 - 0.75: Moderate drift.
> 0.80: High drift.
Performance Diagnostics (Concept Drift)
Residual KS Stat: Measures how much the distribution of prediction errors has changed. A high KS value (> 0.2 in this case) confirms that concept drift is affecting model accuracy.
MSE: A dramatic increase in Mean Squared Error indicates the model is struggling to predict the target under the new shifted distribution.
5. Conclusion
In this tutorial, we compared Perpetual’s unsupervised drift metrics against state-of-the-art multivariate benchmarks:
Perpetual Data & Concept Drift: These metrics provide a model-based view of how data shifts affect both the marginal feature distributions and the predictive structure, showing a clear response to gradual drift.
Energy Distance: A robust kernel-based multivariate metric that captures distribution shifts effectively.
Mahalanobis Distance: Useful for detecting changes in the multivariate mean relative to the training feature covariance.
Adversarial AUC: A powerful classifier-based approach that identifies how distinguishable the new data is from the reference data.
We can see that Perpetual’s internal metrics correlate strongly with these established multivariate methods, offering a fast, performant, and unsupervised built-in alternative for drift monitoring during model inference.