API Reference
This page contains the detailed API reference for the Perpetual Python package.
PerpetualBooster
- class perpetual.PerpetualBooster(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'LogLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, force_children_to_bound_parent: bool = False, missing: float = nan, allow_missing_splits: bool = True, create_missing_branch: bool = False, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, feature_importance_method: str = 'Gain', quantile: float | None = None, reset: bool | None = None, categorical_features: Iterable[int] | Iterable[str] | str | None = 'auto', timeout: float | None = None, iteration_limit: int | None = None, memory_limit: float | None = None, stopping_rounds: int | None = None, max_bin: int = 256, max_cat: int = 1000, interaction_constraints: List[List[int]] | None = None, save_node_stats: bool = False)[source]
Bases:
objectSelf-generalizing gradient boosted decision tree model.
PerpetualBooster automatically determines the best number of boosting rounds using a built-in generalization strategy, eliminating the need for manual early-stopping or cross-validation. It supports regression, binary and multi-class classification, and quantile objectives, as well as custom loss functions.
See also
perpetual.PerpetualBooster.__init__Constructor with full parameter list.
- metadata_attributes: Dict[str, BaseSerializer] = {'cat_mapping': <perpetual.serialize.ObjectSerializer object>, 'classes_': <perpetual.serialize.ObjectSerializer object>, 'feature_importance_method': <perpetual.serialize.ObjectSerializer object>, 'feature_names_in_': <perpetual.serialize.ObjectSerializer object>, 'n_features_': <perpetual.serialize.ObjectSerializer object>}
Metadata attributes that are persisted alongside the model and restored when a saved booster is loaded.
- __init__(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'LogLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, force_children_to_bound_parent: bool = False, missing: float = nan, allow_missing_splits: bool = True, create_missing_branch: bool = False, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, feature_importance_method: str = 'Gain', quantile: float | None = None, reset: bool | None = None, categorical_features: Iterable[int] | Iterable[str] | str | None = 'auto', timeout: float | None = None, iteration_limit: int | None = None, memory_limit: float | None = None, stopping_rounds: int | None = None, max_bin: int = 256, max_cat: int = 1000, interaction_constraints: List[List[int]] | None = None, save_node_stats: bool = False)[source]
Gradient Boosting Machine with Perpetual Learning.
A self-generalizing gradient boosting machine that doesn’t need hyperparameter optimization. It automatically finds the best configuration based on the provided budget. The budget acts as a complexity control: a higher budget allows for more trees and potentially better fit, while a lower budget ensures faster training and better generalization for simpler datasets.
- Parameters:
objective (str or tuple, default="LogLoss") –
Learning objective function to be used for optimization. Valid options are:
”LogLoss”: logistic loss for binary classification.
”BrierLoss”: Brier score loss for probabilistic binary classification.
”HingeLoss”: hinge loss for binary classification.
”SquaredLoss”: squared error for regression.
”QuantileLoss”: quantile error for quantile regression.
”HuberLoss”: Huber loss for robust regression.
”AdaptiveHuberLoss”: adaptive Huber loss for robust regression.
”FairLoss”: Fair loss for robust regression.
”AbsoluteLoss”: absolute (L1) error for regression.
”SquaredLogLoss”: squared log error for regression.
”MapeLoss”: mean absolute percentage error for regression.
”PoissonLoss”: Poisson regression for count data.
”GammaLoss”: Gamma regression with log-link.
”TweedieLoss”: Tweedie regression with log-link.
”CrossEntropyLoss”: cross-entropy loss for targets in [0, 1].
”CrossEntropyLambdaLoss”: alternative weighted cross-entropy.
”ListNetLoss”: ListNet loss for ranking.
custom objective: a tuple of (loss, gradient, initial_value) functions. Each function should have the following signature:
loss(y, pred, weight, group) : returns the loss value for each sample.
gradient(y, pred, weight, group) : returns a tuple of (gradient, hessian). If the hessian is constant (e.g., 1.0 for SquaredLoss), return
Noneto improve performance.initial_value(y, weight, group) : returns the initial value for the booster.
budget (float, default=0.5) – A positive number for fitting budget. Increasing this number will more likely result in more boosting rounds and increased predictive power.
num_threads (int, optional) – Number of threads to be used during training and prediction.
monotone_constraints (dict, optional) – Keys are feature indices or names, values are -1, 1, or 0.
force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.
save_node_stats (bool, default=False) – Whether to save node statistics (required for calibration).
missing (float, default=np.nan) – Value to consider as missing data.
allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.
create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).
terminate_missing_features (iterable, optional) – Features for which missing branches will always be terminated if
create_missing_branchis True.missing_node_treatment (str, default="None") – How to handle weights for missing nodes if
create_missing_branchis True. Options: “None”, “AssignToParent”, “AverageLeafWeight”, “AverageNodeWeight”.log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.
feature_importance_method (str, default="Gain") – Method for calculating feature importance. Options: “Gain”, “Weight”, “Cover”, “TotalGain”, “TotalCover”.
quantile (float, optional) – Target quantile for quantile regression (objective=”QuantileLoss”).
reset (bool, optional) – Whether to reset the model or continue training on subsequent calls to fit.
categorical_features (str or iterable, default="auto") – Feature indices or names to treat as categorical.
timeout (float, optional) – Time limit for fitting in seconds.
iteration_limit (int, optional) – Maximum number of boosting iterations.
memory_limit (float, optional) – Memory limit for training in GB.
stopping_rounds (int, optional) – Early stopping rounds.
max_bin (int, default=256) – Maximum number of bins for feature discretization.
max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.
interaction_constraints (list of list of int, optional) – Interaction constraints.
save_node_stats
- fit(X, y, sample_weight=None, group=None) Self[source]
Fit the gradient booster on a provided dataset.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data. Can be a Polars or Pandas DataFrame, or a 2D Numpy array. Polars DataFrames use a zero-copy columnar path for efficiency.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.
sample_weight (array-like of shape (n_samples,), optional) – Individual weights for each sample. If None, all samples are weighted equally.
group (array-like, optional) – Group labels for ranking objectives.
- Returns:
self – Returns self.
- Return type:
- prune(X, y, sample_weight=None, group=None) Self[source]
Prune the gradient booster on a provided dataset.
This removes nodes that do not contribute to a reduction in loss on the provided validation set.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Validation data.
y (array-like of shape (n_samples,)) – Validation targets.
sample_weight (array-like of shape (n_samples,), optional) – Weights for validation samples.
group (array-like, optional) – Group labels for ranking objectives.
- Returns:
self – Returns self.
- Return type:
- calibrate(X_cal: Any, y_cal: Any, alpha: float | Iterable[float], method: str | None = None) Self[source]
Calibrate the gradient booster for prediction intervals using a selected method.
- Parameters:
X_cal (array-like) – Independent calibration dataset.
y_cal (array-like) – Targets for calibration data.
alpha (float or array-like) – Significance level(s) for the intervals (1 - coverage).
method (str, optional) – Calibration method to use. Options are “MinMax”, “GRP”, “WeightVariance”. If None, defaults to “WeightVariance”.
- Returns:
self – Returns self.
- Return type:
- calibrate_conformal(X: Any, y: Any, X_cal: Any, y_cal: Any, alpha: float | Iterable[float], sample_weight: Any | None = None, group: Any | None = None) Self[source]
Calibrate the gradient booster using Conformal Prediction (CQR).
- Parameters:
X (array-like) – Independent training dataset.
y (array-like) – Targets for training data.
X_cal (array-like) – Independent calibration dataset.
y_cal (array-like) – Targets for calibration data.
alpha (float or array-like) – Significance level(s) for the intervals (1 - coverage).
sample_weight (array-like, optional) – Weights for training data.
group (array-like, optional) – Group IDs for training data.
- Returns:
self – Returns self.
- Return type:
- predict_intervals(X, parallel: bool | None = None) dict[source]
Predict intervals with the fitted booster on new data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – New data for prediction.
parallel (bool, optional) – Whether to run prediction in parallel. If None, uses class default.
- Returns:
intervals – A dictionary containing lower and upper bounds for the specified alpha levels.
- Return type:
- predict_sets(X, parallel: bool | None = None) dict[source]
Predict sets with the fitted booster on new data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – New data for prediction.
parallel (bool, optional) – Whether to run prediction in parallel. If None, uses class default.
- Returns:
sets – A dictionary containing prediction sets for the specified alpha levels. Each set is a list of labels (e.g., [1.0], [0.0], or [0.0, 1.0]).
- Return type:
- predict_distribution(X, n: int = 100, parallel: bool | None = None) ndarray[source]
Predict a distribution using uncalibrated leaf weights from internal nodes.
Generates n predictions for each sample by randomly sampling one of the 5 weights stored in each leaf node. This returns a raw, uncalibrated distribution of predictions.
Note: This method is only available if the booster was fitted with save_node_stats=True.
- Parameters:
- Returns:
distribution – A 2D array where each row contains n predictions for the corresponding sample.
- Return type:
ndarray of shape (n_samples, n)
- predict(X, parallel: bool | None = None) ndarray[source]
Predict with the fitted booster on new data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input features.
parallel (bool, optional) – Whether to run prediction in parallel.
- Returns:
predictions – The predicted values (log-odds for classification, raw values for regression).
- Return type:
ndarray of shape (n_samples,)
- predict_proba(X, parallel: bool | None = None, calibrated: bool = False) ndarray[source]
Predict class probabilities with the fitted booster on new data.
Only valid for classification tasks.
- Parameters:
- Returns:
probabilities – The class probabilities.
- Return type:
ndarray of shape (n_samples, n_classes)
- calculate_drift(X, drift_type: str = 'data', parallel: bool | None = None) float[source]
Calculate drift metrics (data or concept) for the model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – New data to evaluate for drift.
drift_type (str, default="data") –
Type of drift to calculate. Options:
”data”: Multivariate data drift across all tree nodes.
”concept”: Concept drift focusing on nodes that are parents of leaves.
parallel (bool, optional) – Whether to run prediction in parallel. If None, uses class default.
- Returns:
drift_score – The calculated drift score (average Chi-squared statistic).
- Return type:
- predict_log_proba(X, parallel: bool | None = None) ndarray[source]
Predict class log-probabilities with the fitted booster on new data.
Only valid for classification tasks.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input features.
parallel (bool, optional) – Whether to run prediction in parallel.
- Returns:
log_probabilities – The log-probabilities of each class.
- Return type:
ndarray of shape (n_samples, n_classes)
- predict_nodes(X, parallel: bool | None = None) List[source]
Predict leaf node indices with the fitted booster on new data.
- predict_contributions(X, method: str = 'Average', parallel: bool | None = None) ndarray[source]
Predict feature contributions (SHAP-like values) for new data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input features.
method (str, default="Average") –
Method to calculate contributions. Options:
”Average”: Internal node averages.
”Shapley”: Exact tree SHAP values.
”Weight”: Saabas-style leaf weights.
”BranchDifference”: Difference between chosen and other branch.
”MidpointDifference”: Weighted difference between branches.
”ModeDifference”: Difference from the most frequent node.
”ProbabilityChange”: Change in probability (LogLoss only).
parallel (bool, optional) – Whether to run prediction in parallel.
- Returns:
contributions – The contribution of each feature to the prediction. The last column is the bias term.
- Return type:
ndarray of shape (n_samples, n_features + 1)
- partial_dependence(X, feature: str | int, samples: int | None = 100, exclude_missing: bool = True, percentile_bounds: Tuple[float, float] = (0.2, 0.98)) ndarray[source]
Calculate the partial dependence values of a feature.
For each unique value of the feature, this gives the estimate of the predicted value for that feature, with the effects of all other features averaged out.
- Parameters:
X (array-like) – Data used to calculate partial dependence. Should be the same format as passed to
fit().feature (str or int) – The feature for which to calculate partial dependence.
samples (int, optional, default=100) – Number of evenly spaced samples to select. If None, all unique values are used.
exclude_missing (bool, optional, default=True) – Whether to exclude missing values from the calculation.
percentile_bounds (tuple of float, optional, default=(0.2, 0.98)) – Lower and upper percentiles for sample selection.
- Returns:
pd_values – The first column contains the feature values, and the second column contains the partial dependence values.
- Return type:
ndarray of shape (n_samples, 2)
Examples
>>> import matplotlib.pyplot as plt >>> pd_values = model.partial_dependence(X, feature="age") >>> plt.plot(pd_values[:, 0], pd_values[:, 1])
- calculate_feature_importance(method: str = 'Gain', normalize: bool = True) Dict[int, float] | Dict[str, float][source]
Calculate feature importance for the model.
- Parameters:
method (str, optional, default="Gain") –
Importance method. Options:
”Weight”: Number of times a feature is used in splits.
”Gain”: Average improvement in loss brought by a feature.
”Cover”: Average number of samples affected by splits on a feature.
”TotalGain”: Total improvement in loss brought by a feature.
”TotalCover”: Total number of samples affected by splits on a feature.
normalize (bool, optional, default=True) – Whether to normalize importance scores to sum to 1.
- Returns:
importance – A dictionary mapping feature names (or indices) to importance scores.
- Return type:
- json_dump() str[source]
Return the booster model in JSON format.
- Returns:
dump – The JSON representation of the model.
- Return type:
- classmethod load_booster(path: str) Self[source]
Load a booster model from a file.
- Parameters:
path (str) – Path to the saved booster (JSON format).
- Returns:
model – The loaded booster object.
- Return type:
- save_booster(path: str)[source]
Save the booster model to a file.
The model is saved in a JSON-based format.
- Parameters:
path (str) – Path where the model will be saved.
- save_model() bytes[source]
Save the model to a bytes object.
- Returns:
data – The serialized model data.
- Return type:
- classmethod load_model(data: bytes) Self[source]
Load a model from a bytes object.
- Parameters:
data (bytes) – The serialized model data.
- Returns:
model – The loaded booster object.
- Return type:
- classmethod from_json(json_str: str) Self[source]
Load a booster model from a JSON string.
- Parameters:
json_str (str) – The JSON representation of the model.
- Returns:
model – The loaded booster object.
- Return type:
- property is_fitted: bool
Whether the booster has been fitted.
- Returns:
fitted – True if the booster is fitted, False otherwise.
- Return type:
- insert_metadata(key: str, value: str)[source]
Insert metadata into the model.
Metadata is saved alongside the model and can be retrieved later.
- get_node_lists(map_features_names: bool = True) List[List[Node]][source]
Return tree structures as lists of node objects.
- trees_to_dataframe() Any[source]
Return the tree structures as a DataFrame.
- Returns:
df – A Polars or Pandas DataFrame containing tree information.
- Return type:
DataFrame
Sklearn Interface
- class perpetual.sklearn.PerpetualClassifier(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'LogLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, save_node_stats: bool = False, **kwargs)[source]
Bases:
PerpetualBooster,ClassifierMixinA scikit-learn compatible classifier based on PerpetualBooster. Uses ‘LogLoss’ as the default objective.
- __init__(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'LogLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, save_node_stats: bool = False, **kwargs)[source]
Gradient Boosting Machine with Perpetual Learning.
A self-generalizing gradient boosting machine that doesn’t need hyperparameter optimization. It automatically finds the best configuration based on the provided budget.
- Parameters:
objective (str or tuple, default="LogLoss") –
Learning objective function to be used for optimization. Valid options are:
”LogLoss”: logistic loss for binary classification.
custom objective: a tuple of (loss, gradient, initial_value) functions. Each function should have the following signature:
loss(y, pred, weight, group) : returns the loss value for each sample.
gradient(y, pred, weight, group) : returns a tuple of (gradient, hessian). If the hessian is constant (e.g., 1.0 for SquaredLoss), return
Noneto improve performance.initial_value(y, weight, group) : returns the initial value for the booster.
budget (float, default=0.5) – A positive number for fitting budget. Increasing this number will more likely result in more boosting rounds and increased predictive power.
num_threads (int, optional) – Number of threads to be used during training and prediction.
monotone_constraints (dict, optional) – Constraints to enforce a specific relationship between features and target. Keys are feature indices or names, values are -1, 1, or 0.
force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.
missing (float, default=np.nan) – Value to consider as missing data.
allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.
create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).
terminate_missing_features (iterable, optional) – Features for which missing branches will always be terminated if
create_missing_branchis True.missing_node_treatment (str, default="None") – How to handle weights for missing nodes if
create_missing_branchis True. Options: “None”, “AssignToParent”, “AverageLeafWeight”, “AverageNodeWeight”.log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.
feature_importance_method (str, default="Gain") – Method for calculating feature importance. Options: “Gain”, “Weight”, “Cover”, “TotalGain”, “TotalCover”.
quantile (float, optional) – Target quantile for quantile regression (objective=”QuantileLoss”).
reset (bool, optional) – Whether to reset the model or continue training on subsequent calls to fit.
categorical_features (str or iterable, default="auto") – Feature indices or names to treat as categorical.
timeout (float, optional) – Time limit for fitting in seconds.
iteration_limit (int, optional) – Maximum number of boosting iterations.
memory_limit (float, optional) – Memory limit for training in GB.
stopping_rounds (int, optional) – Early stopping rounds.
max_bin (int, default=256) – Maximum number of bins for feature discretization.
max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.
interaction_constraints (list of list of int, optional) – Interaction constraints.
**kwargs – Arbitrary keyword arguments to be passed to the base class.
- score(X, y, sample_weight=None)[source]
Return the mean accuracy on the given test data and labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,)) – True labels.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights.
- Returns:
Mean accuracy of
self.predict(X)w.r.t. y.- Return type:
- fit(X, y, sample_weight=None, **fit_params) Self[source]
Fit the classifier on training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training input samples.
y (array-like of shape (n_samples,)) – Target class labels.
sample_weight (array-like of shape (n_samples,), optional) – Individual weights for each sample.
**fit_params – Additional keyword arguments forwarded to the base
fit.
- Returns:
Fitted estimator.
- Return type:
self
- class perpetual.sklearn.PerpetualRegressor(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'SquaredLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, save_node_stats: bool = False, **kwargs)[source]
Bases:
PerpetualBooster,RegressorMixinA scikit-learn compatible regressor based on PerpetualBooster. Uses ‘SquaredLoss’ as the default objective.
- __init__(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'SquaredLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, save_node_stats: bool = False, **kwargs)[source]
Gradient Boosting Machine with Perpetual Learning.
A self-generalizing gradient boosting machine that doesn’t need hyperparameter optimization. It automatically finds the best configuration based on the provided budget.
- Parameters:
objective (str or tuple, default="SquaredLoss") –
Learning objective function to be used for optimization. Valid options are:
”SquaredLoss”: squared error for regression.
”QuantileLoss”: quantile error for quantile regression.
”HuberLoss”: Huber loss for robust regression.
”AdaptiveHuberLoss”: adaptive Huber loss for robust regression.
custom objective: a tuple of (loss, gradient, initial_value) functions. Each function should have the following signature:
loss(y, pred, weight, group) : returns the loss value for each sample.
gradient(y, pred, weight, group) : returns a tuple of (gradient, hessian). If the hessian is constant (e.g., 1.0 for SquaredLoss), return
Noneto improve performance.initial_value(y, weight, group) : returns the initial value for the booster.
budget (float, default=0.5) – A positive number for fitting budget. Increasing this number will more likely result in more boosting rounds and increased predictive power.
num_threads (int, optional) – Number of threads to be used during training and prediction.
monotone_constraints (dict, optional) – Constraints to enforce a specific relationship between features and target. Keys are feature indices or names, values are -1, 1, or 0.
force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.
missing (float, default=np.nan) – Value to consider as missing data.
allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.
create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).
terminate_missing_features (iterable, optional) – Features for which missing branches will always be terminated if
create_missing_branchis True.missing_node_treatment (str, default="None") – How to handle weights for missing nodes if
create_missing_branchis True. Options: “None”, “AssignToParent”, “AverageLeafWeight”, “AverageNodeWeight”.log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.
feature_importance_method (str, default="Gain") – Method for calculating feature importance. Options: “Gain”, “Weight”, “Cover”, “TotalGain”, “TotalCover”.
quantile (float, optional) – Target quantile for quantile regression (objective=”QuantileLoss”).
reset (bool, optional) – Whether to reset the model or continue training on subsequent calls to fit.
categorical_features (str or iterable, default="auto") – Feature indices or names to treat as categorical.
timeout (float, optional) – Time limit for fitting in seconds.
iteration_limit (int, optional) – Maximum number of boosting iterations.
memory_limit (float, optional) – Memory limit for training in GB.
stopping_rounds (int, optional) – Early stopping rounds.
max_bin (int, default=256) – Maximum number of bins for feature discretization.
max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.
interaction_constraints (list of list of int, optional) – Interaction constraints.
save_node_stats (bool, default=False) – Whether to save node statistics (required for calibration).
**kwargs – Arbitrary keyword arguments to be passed to the base class.
- fit(X, y, sample_weight=None, **fit_params) Self[source]
Fit the regressor on training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training input samples.
y (array-like of shape (n_samples,)) – Target values.
sample_weight (array-like of shape (n_samples,), optional) – Individual weights for each sample.
**fit_params – Additional keyword arguments forwarded to the base
fit.
- Returns:
Fitted estimator.
- Return type:
self
- score(X, y, sample_weight=None)[source]
Return the coefficient of determination ($R^2$) of the prediction.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,)) – True target values.
sample_weight (array-like of shape (n_samples,), optional) – Sample weights.
- Returns:
$R^2$ score of
self.predict(X)w.r.t. y.- Return type:
- class perpetual.sklearn.PerpetualRanker(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'ListNetLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, **kwargs)[source]
Bases:
PerpetualBooster,RegressorMixinA scikit-learn compatible ranker based on PerpetualBooster. Uses ‘ListNetLoss’ as the default objective. Requires the ‘group’ parameter to be passed to fit.
- __init__(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'ListNetLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, **kwargs)[source]
Gradient Boosting Machine with Perpetual Learning.
A self-generalizing gradient boosting machine that doesn’t need hyperparameter optimization. It automatically finds the best configuration based on the provided budget.
- Parameters:
objective (str or tuple, default="ListNetLoss") –
Learning objective function to be used for optimization. Valid options are:
”ListNetLoss”: ListNet loss for ranking.
custom objective: a tuple of (loss, gradient, initial_value) functions. Each function should have the following signature:
loss(y, pred, weight, group) : returns the loss value for each sample.
gradient(y, pred, weight, group) : returns a tuple of (gradient, hessian). If the hessian is constant (e.g., 1.0 for SquaredLoss), return
Noneto improve performance.initial_value(y, weight, group) : returns the initial value for the booster.
budget (float, default=0.5) – A positive number for fitting budget. Increasing this number will more likely result in more boosting rounds and increased predictive power.
num_threads (int, optional) – Number of threads to be used during training and prediction.
monotone_constraints (dict, optional) – Constraints to enforce a specific relationship between features and target. Keys are feature indices or names, values are -1, 1, or 0.
force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.
missing (float, default=np.nan) – Value to consider as missing data.
allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.
create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).
terminate_missing_features (iterable, optional) – Features for which missing branches will always be terminated if
create_missing_branchis True.missing_node_treatment (str, default="None") – How to handle weights for missing nodes if
create_missing_branchis True. Options: “None”, “AssignToParent”, “AverageLeafWeight”, “AverageNodeWeight”.log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.
feature_importance_method (str, default="Gain") – Method for calculating feature importance. Options: “Gain”, “Weight”, “Cover”, “TotalGain”, “TotalCover”.
quantile (float, optional) – Target quantile for quantile regression (objective=”QuantileLoss”).
reset (bool, optional) – Whether to reset the model or continue training on subsequent calls to fit.
categorical_features (str or iterable, default="auto") – Feature indices or names to treat as categorical.
timeout (float, optional) – Time limit for fitting in seconds.
iteration_limit (int, optional) – Maximum number of boosting iterations.
memory_limit (float, optional) – Memory limit for training in GB.
stopping_rounds (int, optional) – Early stopping rounds.
max_bin (int, default=256) – Maximum number of bins for feature discretization.
max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.
interaction_constraints (list of list of int, optional) – Interaction constraints.
**kwargs – Arbitrary keyword arguments to be passed to the base class.
- fit(X, y, group=None, sample_weight=None, **fit_params) Self[source]
Fit the ranker on training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training input samples.
y (array-like of shape (n_samples,)) – Target relevance scores.
group (array-like of int, optional) – Group lengths used by the ranking objective. Required when
objective="ListNetLoss".sample_weight (array-like of shape (n_samples,), optional) – Individual weights for each sample.
**fit_params – Additional keyword arguments forwarded to the base
fit.
- Returns:
Fitted estimator.
- Return type:
self
- Raises:
ValueError – If group is
Noneand the objective is"ListNetLoss".
Causal ML
- class perpetual.iv.BraidedBooster(treatment_objective: str = 'SquaredLoss', outcome_objective: str = 'SquaredLoss', stage1_budget: float = 0.5, stage2_budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, force_children_to_bound_parent: bool = False, missing: float = nan, allow_missing_splits: bool = True, create_missing_branch: bool = False, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, quantile: float | None = None, reset: bool | None = None, categorical_features: Iterable[int] | Iterable[str] | str | None = 'auto', timeout: float | None = None, iteration_limit: int | None = None, memory_limit: float | None = None, stopping_rounds: int | None = None, max_bin: int = 256, max_cat: int = 1000, interaction_constraints: list[list[int]] | None = None)[source]
Bases:
objectTwo-stage instrumental-variable estimator powered by gradient boosting.
Stage 1 regresses the treatment on the instruments, and Stage 2 regresses the outcome on the predicted treatment and covariates. Both stages are fitted using Perpetual’s self-generalizing boosting.
- __init__(treatment_objective: str = 'SquaredLoss', outcome_objective: str = 'SquaredLoss', stage1_budget: float = 0.5, stage2_budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, force_children_to_bound_parent: bool = False, missing: float = nan, allow_missing_splits: bool = True, create_missing_branch: bool = False, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, quantile: float | None = None, reset: bool | None = None, categorical_features: Iterable[int] | Iterable[str] | str | None = 'auto', timeout: float | None = None, iteration_limit: int | None = None, memory_limit: float | None = None, stopping_rounds: int | None = None, max_bin: int = 256, max_cat: int = 1000, interaction_constraints: list[list[int]] | None = None)[source]
Boosted Instrumental Variable (BoostIV) Estimator.
Implements a 2-Stage Least Squares (2SLS) approach using Gradient Boosting.
- Parameters:
treatment_objective (str, default="SquaredLoss") – Objective for Stage 1 (Treatment Model). e.g.,
"SquaredLoss"or"LogLoss".outcome_objective (str, default="SquaredLoss") – Objective for Stage 2 (Outcome Model). e.g.,
"SquaredLoss".stage1_budget (float, default=0.5) – Fitting budget for Stage 1. Higher values allow more boosting rounds.
stage2_budget (float, default=0.5) – Fitting budget for Stage 2. Higher values allow more boosting rounds.
num_threads (int, optional) – Number of threads to use during training and prediction.
monotone_constraints (dict, optional) – Constraints mapping feature indices/names to -1, 1, or 0.
force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.
missing (float, default=np.nan) – Value to consider as missing data.
allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.
create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).
terminate_missing_features (iterable, optional) – Features for which missing branches are always terminated when
create_missing_branchis True.missing_node_treatment (str, default="None") – How to handle weights for missing nodes. Options:
"None","AssignToParent","AverageLeafWeight","AverageNodeWeight".log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.
quantile (float, optional) – Target quantile when using
"QuantileLoss".reset (bool, optional) – Whether to reset the model or continue training on subsequent fits.
categorical_features (iterable or str, default="auto") – Feature indices or names to treat as categorical.
timeout (float, optional) – Time limit for fitting in seconds.
iteration_limit (int, optional) – Maximum number of boosting iterations.
memory_limit (float, optional) – Memory limit for training in GB.
stopping_rounds (int, optional) – Number of rounds without improvement before stopping.
max_bin (int, default=256) – Maximum number of bins for feature discretization.
max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.
interaction_constraints (list of list of int, optional) – Groups of feature indices allowed to interact.
- fit(X, Z, y, w) Self[source]
Fit the IV model.
- Parameters:
X (array-like) – Covariates (Controls).
Z (array-like) – Instruments.
y (array-like) – Outcome variable.
w (array-like) – Treatment received.
- predict(X, w_counterfactual) ndarray[source]
Predict Outcome given X and a counterfactual W.
- Parameters:
X (array-like) – Covariates.
w_counterfactual (array-like) – Treatment value to simulate.
- Returns:
preds – Predicted Outcome.
- Return type:
ndarray
- classmethod from_json(json_str: str) BraidedBooster[source]
Deserialize model from JSON string.
- class perpetual.meta_learners.SLearner(budget: float = 0.5, **kwargs)[source]
Bases:
objectS-Learner (Single Learner) for Heterogeneous Treatment Effect (HTE) estimation.
Uses a single model to estimate the outcome:
Y ~ M(X, W)The CATE is obtained by contrasting predictions under treatment and control:
CATE = M(X, 1) - M(X, 0).- __init__(budget: float = 0.5, **kwargs)[source]
Create an S-Learner.
- Parameters:
budget (float, default=0.5) – Fitting budget forwarded to the Rust backend.
**kwargs – Additional keyword arguments forwarded to PerpetualBooster.
- fit(X, w, y) Self[source]
Fit the single model on covariates augmented with treatment.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Covariate matrix.
w (array-like of shape (n_samples,)) – Binary treatment indicator (0 or 1).
y (array-like of shape (n_samples,)) – Observed outcome.
- Returns:
Fitted estimator.
- Return type:
self
- class perpetual.meta_learners.TLearner(budget: float = 0.5, **kwargs)[source]
Bases:
objectT-Learner (Two Learners) for Heterogeneous Treatment Effect (HTE) estimation.
Uses two separate models:
M0(X) ~ Y[W=0] M1(X) ~ Y[W=1]
The CATE is
M1(X) - M0(X).
- class perpetual.meta_learners.XLearner(budget: float = 0.5, propensity_budget: float | None = None, **kwargs)[source]
Bases:
objectX-Learner for HTE estimation (typically better for imbalanced treatment groups).
- class perpetual.meta_learners.DRLearner(budget: float = 0.5, propensity_budget: float | None = None, **kwargs)[source]
Bases:
objectDoubly Robust (DR) Learner for heterogeneous treatment effect estimation.
Double Machine Learning
- class perpetual.dml.DMLEstimator(budget: float = 0.5, n_folds: int = 2, clip: float = 0.01, **kwargs)[source]
Bases:
objectDouble Machine Learning (DML) estimator for heterogeneous treatment effects.
Uses three gradient boosting stages with K-fold cross-fitting to learn \(\\theta(X)\) — the Conditional Average Treatment Effect (CATE).
- fit(X, w, y) Self[source]
Fit the DML estimator with cross-fitting.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Covariate matrix.
w (array-like of shape (n_samples,)) – Treatment variable (continuous or binary).
y (array-like of shape (n_samples,)) – Outcome variable.
- Returns:
Fitted estimator.
- Return type:
self
Uplift Modeling
- class perpetual.uplift.UpliftBooster(outcome_budget: float = 0.5, propensity_budget: float = 0.5, effect_budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, force_children_to_bound_parent: bool = False, missing: float = nan, allow_missing_splits: bool = True, create_missing_branch: bool = False, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, quantile: float | None = None, reset: bool | None = None, categorical_features: Iterable[int] | Iterable[str] | str | None = 'auto', timeout: float | None = None, iteration_limit: int | None = None, memory_limit: float | None = None, stopping_rounds: int | None = None, max_bin: int = 256, max_cat: int = 1000, interaction_constraints: list[list[int]] | None = None)[source]
Bases:
objectR-Learner uplift model for estimating heterogeneous treatment effects.
Learns the Conditional Average Treatment Effect (CATE)
tau(x) = E[Y | X, W=1] - E[Y | X, W=0]using three sequentially fitted gradient boosting models: an outcome model, a propensity model, and an effect model.- __init__(outcome_budget: float = 0.5, propensity_budget: float = 0.5, effect_budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, force_children_to_bound_parent: bool = False, missing: float = nan, allow_missing_splits: bool = True, create_missing_branch: bool = False, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, quantile: float | None = None, reset: bool | None = None, categorical_features: Iterable[int] | Iterable[str] | str | None = 'auto', timeout: float | None = None, iteration_limit: int | None = None, memory_limit: float | None = None, stopping_rounds: int | None = None, max_bin: int = 256, max_cat: int = 1000, interaction_constraints: list[list[int]] | None = None)[source]
Uplift Boosting Machine (R-Learner).
Estimates the Conditional Average Treatment Effect (CATE): tau(x) = E[Y | X, W=1] - E[Y | X, W=0].
- Parameters:
outcome_budget (float, default=0.5) – Fitting budget for the outcome model
mu(x). Higher values allow more boosting rounds.propensity_budget (float, default=0.5) – Fitting budget for the propensity model
p(x). Higher values allow more boosting rounds.effect_budget (float, default=0.5) – Fitting budget for the effect model
tau(x). Higher values allow more boosting rounds.num_threads (int, optional) – Number of threads to use during training and prediction.
monotone_constraints (dict, optional) – Constraints mapping feature indices/names to -1, 1, or 0.
force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.
missing (float, default=np.nan) – Value to consider as missing data.
allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.
create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).
terminate_missing_features (iterable, optional) – Features for which missing branches are always terminated when
create_missing_branchis True.missing_node_treatment (str, default="None") – How to handle weights for missing nodes. Options:
"None","AssignToParent","AverageLeafWeight","AverageNodeWeight".log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.
quantile (float, optional) – Target quantile when using
"QuantileLoss".reset (bool, optional) – Whether to reset the model or continue training on subsequent fits.
categorical_features (iterable or str, default="auto") – Feature indices or names to treat as categorical.
timeout (float, optional) – Time limit for fitting in seconds.
iteration_limit (int, optional) – Maximum number of boosting iterations.
memory_limit (float, optional) – Memory limit for training in GB.
stopping_rounds (int, optional) – Number of rounds without improvement before stopping.
max_bin (int, default=256) – Maximum number of bins for feature discretization.
max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.
interaction_constraints (list of list of int, optional) – Groups of feature indices allowed to interact.
- fit(X, w, y) Self[source]
Fit the Uplift model.
- Parameters:
X (array-like) – Covariates.
w (array-like) – Treatment indicator (0 or 1).
y (array-like) – Outcome variable.
- predict(X) ndarray[source]
Predict CATE.
- Parameters:
X (array-like) – Covariates.
- Returns:
cate – Predicted Conditional Average Treatment Effect.
- Return type:
ndarray
- classmethod from_json(json_str: str) UpliftBooster[source]
Deserialize model from JSON string.
Policy Learning
- class perpetual.policy.PolicyLearner(budget: float = 0.5, mode: str = 'ipw', propensity_budget: float | None = None, **kwargs)[source]
Bases:
objectPolicy learner via Inverse Propensity Weighting.
Learns a treatment-assignment policy \(\\pi(X)\) that maximizes expected reward using the Athey & Wager (2021) policy-learning framework.
The learned policy assigns \(W = 1\) when the boosted score \(F(X) > 0\).
- Parameters:
budget (float, default=0.5) – Fitting budget forwarded to
PerpetualBooster.mode (str, default="ipw") –
"ipw"for standard Inverse Propensity Weighting or"aipw"for Augmented (Doubly Robust) IPW.propensity_budget (float, optional) – Separate budget for the propensity model. If
None, defaults tobudget. Only used whenpropensityis not supplied tofit().**kwargs – Additional keyword arguments forwarded to
PerpetualBooster.
- feature_importances_
Feature importances from the policy model.
- Type:
ndarray of shape (n_features,)
Examples
>>> from perpetual.policy import PolicyLearner >>> import numpy as np >>> n = 500 >>> X = np.random.randn(n, 5) >>> w = np.random.binomial(1, 0.5, n) >>> y = X[:, 0] * w + np.random.randn(n) * 0.5 >>> pl = PolicyLearner(budget=0.3) >>> pl.fit(X, w, y) >>> policy = pl.predict(X)
References
Athey, S., & Wager, S. (2021). Policy learning with observational data. Econometrica, 89(1), 133-161.
- __init__(budget: float = 0.5, mode: str = 'ipw', propensity_budget: float | None = None, **kwargs)[source]
- fit(X, w, y, propensity: ndarray | None = None, mu_hat_1: ndarray | None = None, mu_hat_0: ndarray | None = None) Self[source]
Fit the policy learner.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Covariate matrix.
w (array-like of shape (n_samples,)) – Observed binary treatment assignment (0 or 1).
y (array-like of shape (n_samples,)) – Observed outcome.
propensity (array-like of shape (n_samples,), optional) – Estimated \(P(W=1|X)\). If
None, a propensity model is fitted internally.mu_hat_1 (array-like of shape (n_samples,), optional) – Predicted outcome under treatment \(\hat{\mu}_1(X)\). Required when
mode="aipw".mu_hat_0 (array-like of shape (n_samples,), optional) – Predicted outcome under control \(\hat{\mu}_0(X)\). Required when
mode="aipw".
- Returns:
Fitted estimator.
- Return type:
self
- predict(X) ndarray[source]
Predict the optimal treatment assignment.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Covariate matrix.
- Returns:
Binary treatment policy (1 = treat, 0 = do not treat).
- Return type:
ndarray of shape (n_samples,)
Causal Metrics
- perpetual.causal_metrics.cumulative_gain_curve(y_true: ndarray, w_true: ndarray, uplift_score: ndarray) Tuple[ndarray, ndarray][source]
Compute the cumulative gain (uplift) curve.
Samples are sorted by
uplift_scorein descending order. At each fraction of the population the observed uplift (difference in conversion rates between treated and control) multiplied by the fraction of the population seen so far is computed.- Parameters:
y_true (array-like of shape (n_samples,)) – Observed binary outcome (0 or 1).
w_true (array-like of shape (n_samples,)) – Observed binary treatment (0 or 1).
uplift_score (array-like of shape (n_samples,)) – Predicted CATE / uplift score (higher ⇒ more benefit from treatment).
- Returns:
fractions (ndarray of shape (n_samples,)) – Fraction of population from 0 to 1.
gains (ndarray of shape (n_samples,)) – Cumulative gain at each fraction.
- perpetual.causal_metrics.auuc(y_true: ndarray, w_true: ndarray, uplift_score: ndarray, normalize: bool = True) float[source]
Area Under the Uplift Curve (AUUC).
- Parameters:
y_true (array-like of shape (n_samples,)) – Observed binary outcome.
w_true (array-like of shape (n_samples,)) – Observed binary treatment indicator.
uplift_score (array-like of shape (n_samples,)) – Predicted CATE / uplift score.
normalize (bool, default=True) – If
True, subtract the area of a random model (diagonal) so that a random model scores 0.
- Returns:
AUUC value.
- Return type:
- perpetual.causal_metrics.qini_curve(y_true: ndarray, w_true: ndarray, uplift_score: ndarray) Tuple[ndarray, ndarray][source]
Compute the Qini curve.
The Qini curve counts the incremental number of positive outcomes attributable to treatment as a function of the population fraction targeted.
- Parameters:
y_true (array-like of shape (n_samples,)) – Observed binary outcome.
w_true (array-like of shape (n_samples,)) – Observed binary treatment indicator.
uplift_score (array-like of shape (n_samples,)) – Predicted CATE / uplift score.
- Returns:
fractions (ndarray of shape (n_samples + 1,)) – Population fraction (starts at 0).
qini (ndarray of shape (n_samples + 1,)) – Qini value at each fraction (starts at 0).
- perpetual.causal_metrics.qini_coefficient(y_true: ndarray, w_true: ndarray, uplift_score: ndarray) float[source]
Qini coefficient: area between the Qini curve and the random diagonal.
- Parameters:
y_true (array-like of shape (n_samples,)) – Observed binary outcome.
w_true (array-like of shape (n_samples,)) – Observed binary treatment indicator.
uplift_score (array-like of shape (n_samples,)) – Predicted CATE / uplift score.
- Returns:
Qini coefficient value.
- Return type:
Fairness
- class perpetual.fairness.FairClassifier(sensitive_feature: int, fairness_type: str = 'demographic_parity', lam: float = 1.0, budget: float = 0.5, **kwargs)[source]
Bases:
objectFairness-aware gradient boosting classifier.
Wraps a
PerpetualBoosterwith an in-processing fairness penalty that regularizes the log-loss gradient to reduce dependence of predictions on a sensitive attribute.- Parameters:
sensitive_feature (int) – Column index of the sensitive attribute in X. The column must be binary (0 or 1).
fairness_type (str, default="demographic_parity") –
Fairness criterion. One of:
"demographic_parity"— penalize overall disparity."equalized_odds"— penalize disparity within each label class.
lam (float, default=1.0) – Strength of the fairness penalty (\(\\lambda\)).
budget (float, default=0.5) – Fitting budget forwarded to
PerpetualBooster.**kwargs – Additional keyword arguments forwarded to
PerpetualBooster.
- feature_importances_
Feature importances from the fitted model.
- Type:
ndarray of shape (n_features,)
Examples
>>> from perpetual.fairness import FairClassifier >>> import numpy as np >>> X = np.column_stack([np.random.randn(200, 3), ... np.random.binomial(1, 0.5, 200)]) >>> y = (X[:, 0] > 0).astype(float) >>> clf = FairClassifier(sensitive_feature=3, lam=2.0) >>> clf.fit(X, y) >>> probs = clf.predict_proba(X)
Notes
The fairness penalty is applied only through the gradient; the reported loss is standard log-loss. This mirrors the Rust
FairnessObjectiveimplementation.- __init__(sensitive_feature: int, fairness_type: str = 'demographic_parity', lam: float = 1.0, budget: float = 0.5, **kwargs)[source]
- fit(X, y) Self[source]
Fit the fair classifier.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix. The column at index
sensitive_featuremust contain binary (0/1) values.y (array-like of shape (n_samples,)) – Binary target variable (0 or 1).
- Returns:
Fitted estimator.
- Return type:
self
- predict(X) ndarray[source]
Predict class labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
- Returns:
Predicted class labels (0 or 1).
- Return type:
ndarray of shape (n_samples,)
- predict_proba(X) ndarray[source]
Predict class probabilities.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix.
- Returns:
Predicted probabilities for class 0 and class 1.
- Return type:
ndarray of shape (n_samples, 2)
Regulatory Risk
- class perpetual.risk.PerpetualRiskEngine(model: PerpetualBooster)[source]
Bases:
objectRisk Engine for generating Adverse Action (Reason) Codes.
This engine wraps a fitted PerpetualBooster model and provides functionality to explain rejections (Adverse Actions) by attributing the negative decision to specific features.
- __init__(model: PerpetualBooster)[source]
Wrap a fitted booster for reason-code generation.
- Parameters:
model (PerpetualBooster) – A fitted PerpetualBooster instance.
- generate_reason_codes(X, threshold: float, n_codes: int = 3, method: str = 'Average', rejection_direction: str = 'lower') List[List[str]][source]
Generate reason codes for samples that fall below/above the approval threshold.
Logic: 1. Predict score for X. 2. Identify rejected samples based on
rejection_direction:“lower”: score < threshold (e.g. FICO score)
“higher”: score > threshold (e.g. Default probability)
Identify top N features dragging the score in the rejected direction.
- Parameters:
X (array-like) – Applicant data.
threshold (float) – Approval threshold.
n_codes (int, default=3) – Number of reason codes to return per applicant.
method (str, default="Average") – Contribution method.
rejection_direction ({"lower", "higher"}, default="lower") – Direction of rejection. If “lower”, scores below threshold are rejected. If “higher”, scores above threshold are rejected.
- Returns:
reasons – For each sample, a list of reason-code strings. Approved samples get an empty list.
- Return type: