Drift Detection

Perpetual provides built-in methods to detect Data Drift and Concept Drift using the internal structure of the trained model.

How it Works

Drift detection in Perpetual is based on comparing the distribution of samples across the decision tree nodes during training versus the distribution observed in new data.

Data Drift (Multivariate): Calculates the average Chi-squared statistic across all internal nodes of the model. This detects if the feature distributions have shifted in a way that affects which paths samples take through the trees.
Concept Drift: Focuses on the nodes that are parents of leaves. This detects if the relationship between features and the target is likely shifting by monitoring changes in the final decision-level node distributions.

Usage

To enable drift detection, you must initialize the model with save_node_stats=True.

from perpetual import PerpetualBooster
import numpy as np

# 1. Train the model
model = PerpetualBooster(save_node_stats=True)
model.fit(X_train, y_train)

# 2. Calculate drift on new data
data_drift_score = model.calculate_drift(X_new, drift_type="data")
concept_drift_score = model.calculate_drift(X_new, drift_type="concept")

print(f"Data Drift: {data_drift_score}")
print(f"Concept Drift: {concept_drift_score}")

Interpreting the Score

The drift score is an average Chi-squared statistic. Larger values indicate more significant drift.

Near 0: The new data follows the same distribution as the training data.
Large values: Suggest a significant shift in data distribution (Data Drift) or prediction patterns (Concept Drift).

Note: This method is unsupervised and does not require target values for the new data.

Examples

For a detailed walkthrough, see the Drift Detection Tutorial.