Architecture

Perpetual is designed as a high-performance gradient boosting machine with a clean separation between its compute-intensive core and its user-facing API.

System Overview

The system consists of two main layers:

  1. Rust Core (`perpetual-rs`): A pure Rust crate that implements the gradient boosting algorithm, histogram construction, and tree learning logic.

  2. Python Interface (`py-perpetual`): A Python extension module built with PyO3 that exposes the Rust core to Python users, handling data conversion and API bindings.

Rust Core

The core logic resides in the src/ directory of the repository. It is built for performance and memory safety.

Key Components

  • PerpetualBooster: The central struct (src/booster/core.rs) that manages the ensemble of decision trees. It handles the training loop, including gradient calculation, tree growing, and prediction.

  • Histogram-Based Learning: Perpetual uses a histogram-based algorithm for finding optimal splits. Continuous features are discretized into bins (src/binning.rs), significantly reducing the computational complexity of finding splits.

  • Parallelism: The core heavily utilizes rayon for data parallelism. Operations like histogram building, partial dependence calculations, and predictions are multi-threaded.

  • Generic Objectives: The Objective trait (src/objective_functions/objective.rs) allows for a flexible implementation of loss functions. Perpetual supports standard objectives like LogLoss and SquaredLoss, as well as complex ones like QuantileLoss and custom user-defined objectives.

Python Interface

The Python package is a thin wrapper around the Rust core, ensuring that the heavy lifting is done in native code.

PyO3 Bindings

We use PyO3 to generate the Python extension module. The PerpetualBooster class in Python (package-python/python/perpetual/booster.py) holds a reference to the Rust PerpetualBooster struct (package-python/src/booster.rs). Method calls in Python are directly forwarded to their Rust counterparts.

Zero-Copy Data Transfer

One of the key architectural features is the zero-copy interface for columnar data.

  • Polars Integration: When a Polars DataFrame is passed to fit or predict, Perpetual uses the fit_columnar path. This path reads the underlying memory buffers of the DataFrame directly from Rust without copying the data.

  • Numpy/Pandas: Standard Numpy arrays and Pandas DataFrames are handled via the contiguous array interface, which may involve copying if the data is not already in the expected memory layout (e.g., C-contiguous vs F-contiguous).

PerpetualBooster Algorithm

The PerpetualBooster is a specialized implementation that removes the need for hyperparameter optimization.

Budget and Self-Generalization

Instead of tuning learning rate, tree depth, and other regularization parameters individually, Perpetual links them to a single budget parameter.

  • Learning Rate (`eta`): The learning rate is deterministically calculated from the budget: \(\eta = 10^{-\text{budget}}\).

  • Stopping Criteria: The algorithm monitors the generalization error of the trees during training. If the trees start to overfit (generalization capability drops below a threshold), the training stops early.

For a comprehensive explanation of the self-generalization algorithm, please refer to our blog post: How Perpetual Works.