Drift Detection
===============

Perpetual provides built-in methods to detect **Data Drift** and **Concept Drift** using the internal structure of the trained model. 

How it Works
------------

Drift detection in Perpetual is based on comparing the distribution of samples across the decision tree nodes during training versus the distribution observed in new data.

1. **Data Drift (Multivariate)**: Calculates the average Chi-squared statistic across all internal nodes of the model. This detects if the feature distributions have shifted in a way that affects which paths samples take through the trees.
2. **Concept Drift**: Focuses on the nodes that are parents of leaves. This detects if the relationship between features and the target is likely shifting by monitoring changes in the final decision-level node distributions.

Usage
-----

To enable drift detection, you must initialize the model with ``save_node_stats=True``.

.. code-block:: python

    from perpetual import PerpetualBooster
    import numpy as np

    # 1. Train the model
    model = PerpetualBooster(save_node_stats=True)
    model.fit(X_train, y_train)

    # 2. Calculate drift on new data
    data_drift_score = model.calculate_drift(X_new, drift_type="data")
    concept_drift_score = model.calculate_drift(X_new, drift_type="concept")

    print(f"Data Drift: {data_drift_score}")
    print(f"Concept Drift: {concept_drift_score}")

Interpreting the Score
----------------------

The drift score is an average Chi-squared statistic. Larger values indicate more significant drift. 

* **Near 0**: The new data follows the same distribution as the training data.
* **Large values**: Suggest a significant shift in data distribution (Data Drift) or prediction patterns (Concept Drift).

Note: This method is unsupervised and does not require target values for the new data.

Examples
--------

For a detailed walkthrough, see the :doc:`Drift Detection Tutorial <tutorials/drift_tutorial>`.