Drift Detection
Perpetual provides built-in methods to detect Data Drift and Concept Drift using the internal structure of the trained model.
How it Works
Drift detection in Perpetual is based on comparing the distribution of samples across the decision tree nodes during training versus the distribution observed in new data.
Data Drift (Multivariate): Calculates the average Chi-squared statistic across all internal nodes of the model. This detects if the feature distributions have shifted in a way that affects which paths samples take through the trees.
Concept Drift: Focuses on the nodes that are parents of leaves. This detects if the relationship between features and the target is likely shifting by monitoring changes in the final decision-level node distributions.
Usage
To enable drift detection, you must initialize the model with save_node_stats=True.
from perpetual import PerpetualBooster
import numpy as np
# 1. Train the model
model = PerpetualBooster(save_node_stats=True)
model.fit(X_train, y_train)
# 2. Calculate drift on new data
data_drift_score = model.calculate_drift(X_new, drift_type="data")
concept_drift_score = model.calculate_drift(X_new, drift_type="concept")
print(f"Data Drift: {data_drift_score}")
print(f"Concept Drift: {concept_drift_score}")
Interpreting the Score
The drift score is an average Chi-squared statistic. Larger values indicate more significant drift.
Near 0: The new data follows the same distribution as the training data.
Large values: Suggest a significant shift in data distribution (Data Drift) or prediction patterns (Concept Drift).
Note: This method is unsupervised and does not require target values for the new data.
Examples
For a detailed walkthrough, see the Drift Detection Tutorial.