API Reference

This page contains the detailed API reference for the Perpetual Python package.

PerpetualBooster

class perpetual.PerpetualBooster(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'LogLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, force_children_to_bound_parent: bool = False, missing: float = nan, allow_missing_splits: bool = True, create_missing_branch: bool = False, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, feature_importance_method: str = 'Gain', quantile: float | None = None, reset: bool | None = None, categorical_features: Iterable[int] | Iterable[str] | str | None = 'auto', timeout: float | None = None, iteration_limit: int | None = None, memory_limit: float | None = None, stopping_rounds: int | None = None, max_bin: int = 256, max_cat: int = 1000)[source]

Bases: object

metadata_attributes: Dict[str, BaseSerializer] = {'cat_mapping': <perpetual.serialize.ObjectSerializer object>, 'classes_': <perpetual.serialize.ObjectSerializer object>, 'feature_importance_method': <perpetual.serialize.ObjectSerializer object>, 'feature_names_in_': <perpetual.serialize.ObjectSerializer object>, 'n_features_': <perpetual.serialize.ObjectSerializer object>}
__init__(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'LogLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, force_children_to_bound_parent: bool = False, missing: float = nan, allow_missing_splits: bool = True, create_missing_branch: bool = False, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, feature_importance_method: str = 'Gain', quantile: float | None = None, reset: bool | None = None, categorical_features: Iterable[int] | Iterable[str] | str | None = 'auto', timeout: float | None = None, iteration_limit: int | None = None, memory_limit: float | None = None, stopping_rounds: int | None = None, max_bin: int = 256, max_cat: int = 1000)[source]

Gradient Boosting Machine with Perpetual Learning.

A self-generalizing gradient boosting machine that doesn’t need hyperparameter optimization. It automatically finds the best configuration based on the provided budget.

Parameters:
  • objective (str or tuple, default="LogLoss") –

    Learning objective function to be used for optimization. Valid options are:

    • ”LogLoss”: logistic loss for binary classification.

    • ”SquaredLoss”: squared error for regression.

    • ”QuantileLoss”: quantile error for quantile regression.

    • ”HuberLoss”: Huber loss for robust regression.

    • ”AdaptiveHuberLoss”: adaptive Huber loss for robust regression.

    • ”ListNetLoss”: ListNet loss for ranking.

    • custom objective: a tuple of (loss, gradient, initial_value) functions. Each function should have the following signature:

      • loss(y, pred, weight, group) : returns the loss value for each sample.

      • gradient(y, pred, weight, group) : returns a tuple of (gradient, hessian). If the hessian is constant (e.g., 1.0 for SquaredLoss), return None to improve performance.

      • initial_value(y, weight, group) : returns the initial value for the booster.

  • budget (float, default=0.5) – A positive number for fitting budget. Increasing this number will more likely result in more boosting rounds and increased predictive power.

  • num_threads (int, optional) – Number of threads to be used during training and prediction.

  • monotone_constraints (dict, optional) – Constraints to enforce a specific relationship between features and target. Keys are feature indices or names, values are -1, 1, or 0.

  • force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.

  • missing (float, default=np.nan) – Value to consider as missing data.

  • allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.

  • create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).

  • terminate_missing_features (iterable, optional) – Features for which missing branches will always be terminated if create_missing_branch is True.

  • missing_node_treatment (str, default="None") – How to handle weights for missing nodes if create_missing_branch is True. Options: “None”, “AssignToParent”, “AverageLeafWeight”, “AverageNodeWeight”.

  • log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.

  • feature_importance_method (str, default="Gain") – Method for calculating feature importance. Options: “Gain”, “Weight”, “Cover”, “TotalGain”, “TotalCover”.

  • quantile (float, optional) – Target quantile for quantile regression (objective=”QuantileLoss”).

  • reset (bool, optional) – Whether to reset the model or continue training on subsequent calls to fit.

  • categorical_features (str or iterable, default="auto") – Feature indices or names to treat as categorical.

  • timeout (float, optional) – Time limit for fitting in seconds.

  • iteration_limit (int, optional) – Maximum number of boosting iterations.

  • memory_limit (float, optional) – Memory limit for training in GB.

  • stopping_rounds (int, optional) – Early stopping rounds.

  • max_bin (int, default=256) – Maximum number of bins for feature discretization.

  • max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.

feature_names_in_

Names of features seen during fit().

Type:

list of str

n_features_

Number of features seen during fit().

Type:

int

classes_

Class labels for classification tasks.

Type:

list

feature_importances_

Feature importances calculated via feature_importance_method.

Type:

ndarray of shape (n_features,)

See also

perpetual.sklearn.PerpetualClassifier

Scikit-learn compatible classifier.

perpetual.sklearn.PerpetualRegressor

Scikit-learn compatible regressor.

Examples

Basic usage for binary classification:

>>> from perpetual import PerpetualBooster
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=1000, n_features=20)
>>> model = PerpetualBooster(objective="LogLoss")
>>> model.fit(X, y)
>>> preds = model.predict(X[:5])

Custom objective example:

>>> def loss(y, pred, weight, group):
...     return (y - pred) ** 2
>>> def gradient(y, pred, weight, group):
...     return (pred - y), None
>>> def initial_value(y, weight, group):
...     return np.mean(y)
>>> model = PerpetualBooster(objective=(loss, gradient, initial_value))
>>> model.fit(X, y)
fit(X, y, sample_weight=None, group=None) Self[source]

Fit the gradient booster on a provided dataset.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data. Can be a Polars or Pandas DataFrame, or a 2D Numpy array. Polars DataFrames use a zero-copy columnar path for efficiency.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

  • sample_weight (array-like of shape (n_samples,), optional) – Individual weights for each sample. If None, all samples are weighted equally.

  • group (array-like, optional) – Group labels for ranking objectives.

Returns:

self – Returns self.

Return type:

object

prune(X, y, sample_weight=None, group=None) Self[source]

Prune the gradient booster on a provided dataset.

This removes nodes that do not contribute to a reduction in loss on the provided validation set.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Validation data.

  • y (array-like of shape (n_samples,)) – Validation targets.

  • sample_weight (array-like of shape (n_samples,), optional) – Weights for validation samples.

  • group (array-like, optional) – Group labels for ranking objectives.

Returns:

self – Returns self.

Return type:

object

calibrate(X_train, y_train, X_cal, y_cal, alpha, sample_weight=None, group=None) Self[source]

Calibrate the gradient booster for prediction intervals.

Uses the provided training and calibration sets to compute scaling factors for intervals.

Parameters:
  • X_train (array-like) – Data used to train the base model.

  • y_train (array-like) – Targets for training data.

  • X_cal (array-like) – Independent calibration dataset.

  • y_cal (array-like) – Targets for calibration data.

  • alpha (float or array-like) – Significance level(s) for the intervals (1 - coverage).

  • sample_weight (array-like, optional) – Sample weights.

  • group (array-like, optional) – Group labels.

Returns:

self – Returns self.

Return type:

object

predict_intervals(X, parallel: bool | None = None) dict[source]

Predict intervals with the fitted booster on new data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – New data for prediction.

  • parallel (bool, optional) – Whether to run prediction in parallel. If None, uses class default.

Returns:

intervals – A dictionary containing lower and upper bounds for the specified alpha levels.

Return type:

dict

predict(X, parallel: bool | None = None) ndarray[source]

Predict with the fitted booster on new data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • parallel (bool, optional) – Whether to run prediction in parallel.

Returns:

predictions – The predicted values (log-odds for classification, raw values for regression).

Return type:

ndarray of shape (n_samples,)

predict_proba(X, parallel: bool | None = None) ndarray[source]

Predict class probabilities with the fitted booster on new data.

Only valid for classification tasks.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • parallel (bool, optional) – Whether to run prediction in parallel.

Returns:

probabilities – The class probabilities.

Return type:

ndarray of shape (n_samples, n_classes)

predict_log_proba(X, parallel: bool | None = None) ndarray[source]

Predict class log-probabilities with the fitted booster on new data.

Only valid for classification tasks.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • parallel (bool, optional) – Whether to run prediction in parallel.

Returns:

log_probabilities – The log-probabilities of each class.

Return type:

ndarray of shape (n_samples, n_classes)

predict_nodes(X, parallel: bool | None = None) List[source]

Predict leaf node indices with the fitted booster on new data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • parallel (bool, optional) – Whether to run prediction in parallel.

Returns:

node_indices – A list where each element corresponds to a tree and contains node indices for each sample.

Return type:

list of ndarray

property feature_importances_: ndarray
predict_contributions(X, method: str = 'Average', parallel: bool | None = None) ndarray[source]

Predict feature contributions (SHAP-like values) for new data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • method (str, default="Average") –

    Method to calculate contributions. Options:

    • ”Average”: Internal node averages.

    • ”Shapley”: Exact tree SHAP values.

    • ”Weight”: Saabas-style leaf weights.

    • ”BranchDifference”: Difference between chosen and other branch.

    • ”MidpointDifference”: Weighted difference between branches.

    • ”ModeDifference”: Difference from the most frequent node.

    • ”ProbabilityChange”: Change in probability (LogLoss only).

  • parallel (bool, optional) – Whether to run prediction in parallel.

Returns:

contributions – The contribution of each feature to the prediction. The last column is the bias term.

Return type:

ndarray of shape (n_samples, n_features + 1)

partial_dependence(X, feature: str | int, samples: int | None = 100, exclude_missing: bool = True, percentile_bounds: Tuple[float, float] = (0.2, 0.98)) ndarray[source]

Calculate the partial dependence values of a feature.

For each unique value of the feature, this gives the estimate of the predicted value for that feature, with the effects of all other features averaged out.

Parameters:
  • X (array-like) – Data used to calculate partial dependence. Should be the same format as passed to fit().

  • feature (str or int) – The feature for which to calculate partial dependence.

  • samples (int, optional, default=100) – Number of evenly spaced samples to select. If None, all unique values are used.

  • exclude_missing (bool, optional, default=True) – Whether to exclude missing values from the calculation.

  • percentile_bounds (tuple of float, optional, default=(0.2, 0.98)) – Lower and upper percentiles for sample selection.

Returns:

pd_values – The first column contains the feature values, and the second column contains the partial dependence values.

Return type:

ndarray of shape (n_samples, 2)

Examples

>>> import matplotlib.pyplot as plt
>>> pd_values = model.partial_dependence(X, feature="age")
>>> plt.plot(pd_values[:, 0], pd_values[:, 1])
calculate_feature_importance(method: str = 'Gain', normalize: bool = True) Dict[int, float] | Dict[str, float][source]

Calculate feature importance for the model.

Parameters:
  • method (str, optional, default="Gain") –

    Importance method. Options:

    • ”Weight”: Number of times a feature is used in splits.

    • ”Gain”: Average improvement in loss brought by a feature.

    • ”Cover”: Average number of samples affected by splits on a feature.

    • ”TotalGain”: Total improvement in loss brought by a feature.

    • ”TotalCover”: Total number of samples affected by splits on a feature.

  • normalize (bool, optional, default=True) – Whether to normalize importance scores to sum to 1.

Returns:

importance – A dictionary mapping feature names (or indices) to importance scores.

Return type:

dict

text_dump() List[str][source]

Return the booster model in a human-readable text format.

Returns:

dump – A list where each string represents a tree in the ensemble.

Return type:

list of str

json_dump() str[source]

Return the booster model in JSON format.

Returns:

dump – The JSON representation of the model.

Return type:

str

classmethod load_booster(path: str) Self[source]

Load a booster model from a file.

Parameters:

path (str) – Path to the saved booster (JSON format).

Returns:

model – The loaded booster object.

Return type:

PerpetualBooster

save_booster(path: str)[source]

Save the booster model to a file.

The model is saved in a JSON-based format.

Parameters:

path (str) – Path where the model will be saved.

insert_metadata(key: str, value: str)[source]

Insert metadata into the model.

Metadata is saved alongside the model and can be retrieved later.

Parameters:
  • key (str) – The key for the metadata item.

  • value (str) – The value for the metadata item.

get_metadata(key: str) str[source]

Get metadata associated with a given key.

Parameters:

key (str) – The key to look up in the metadata.

Returns:

value – The value associated with the key.

Return type:

str

property base_score: float | Iterable[float]

The base score(s) of the model.

Returns:

score – The initial prediction value(s) of the model.

Return type:

float or iterable of float

property number_of_trees: int | Iterable[int]

The number of trees in the ensemble.

Returns:

n_trees – Total number of trees.

Return type:

int or iterable of int

get_params(deep=True) Dict[str, Any][source]

Get parameters for this booster.

Parameters:

deep (bool, default=True) – Currently ignored, exists for scikit-learn compatibility.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

set_params(**params: Any) Self[source]

Set parameters for this booster.

Parameters:

**params (dict) – Booster parameters.

Returns:

self – Returns self.

Return type:

object

get_node_lists(map_features_names: bool = True) List[List[Node]][source]

Return tree structures as lists of node objects.

Parameters:

map_features_names (bool, default=True) – Whether to use feature names instead of indices.

Returns:

trees – Each inner list represents a tree.

Return type:

list of list of Node

trees_to_dataframe() Any[source]

Return the tree structures as a DataFrame.

Returns:

df – A Polars or Pandas DataFrame containing tree information.

Return type:

DataFrame

save_as_xgboost(path: str)[source]

Save the model in XGBoost JSON format.

Parameters:

path (str) – The path where the XGBoost-compatible model will be saved.

save_as_onnx(path: str, name: str = 'perpetual_model')[source]

Save the model in ONNX format.

Parameters:
  • path (str) – The path where the ONNX model will be saved.

  • name (str, optional, default="perpetual_model") – The name of the graph in the exported model.

Sklearn Interface

class perpetual.sklearn.PerpetualClassifier(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'LogLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, **kwargs)[source]

Bases: PerpetualBooster, ClassifierMixin

A scikit-learn compatible classifier based on PerpetualBooster. Uses ‘LogLoss’ as the default objective.

__init__(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'LogLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, **kwargs)[source]

Gradient Boosting Machine with Perpetual Learning.

A self-generalizing gradient boosting machine that doesn’t need hyperparameter optimization. It automatically finds the best configuration based on the provided budget.

Parameters:
  • objective (str or tuple, default="LogLoss") –

    Learning objective function to be used for optimization. Valid options are:

    • ”LogLoss”: logistic loss for binary classification.

    • custom objective: a tuple of (loss, gradient, initial_value) functions. Each function should have the following signature:

      • loss(y, pred, weight, group) : returns the loss value for each sample.

      • gradient(y, pred, weight, group) : returns a tuple of (gradient, hessian). If the hessian is constant (e.g., 1.0 for SquaredLoss), return None to improve performance.

      • initial_value(y, weight, group) : returns the initial value for the booster.

  • budget (float, default=0.5) – A positive number for fitting budget. Increasing this number will more likely result in more boosting rounds and increased predictive power.

  • num_threads (int, optional) – Number of threads to be used during training and prediction.

  • monotone_constraints (dict, optional) – Constraints to enforce a specific relationship between features and target. Keys are feature indices or names, values are -1, 1, or 0.

  • force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.

  • missing (float, default=np.nan) – Value to consider as missing data.

  • allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.

  • create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).

  • terminate_missing_features (iterable, optional) – Features for which missing branches will always be terminated if create_missing_branch is True.

  • missing_node_treatment (str, default="None") – How to handle weights for missing nodes if create_missing_branch is True. Options: “None”, “AssignToParent”, “AverageLeafWeight”, “AverageNodeWeight”.

  • log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.

  • feature_importance_method (str, default="Gain") – Method for calculating feature importance. Options: “Gain”, “Weight”, “Cover”, “TotalGain”, “TotalCover”.

  • quantile (float, optional) – Target quantile for quantile regression (objective=”QuantileLoss”).

  • reset (bool, optional) – Whether to reset the model or continue training on subsequent calls to fit.

  • categorical_features (str or iterable, default="auto") – Feature indices or names to treat as categorical.

  • timeout (float, optional) – Time limit for fitting in seconds.

  • iteration_limit (int, optional) – Maximum number of boosting iterations.

  • memory_limit (float, optional) – Memory limit for training in GB.

  • stopping_rounds (int, optional) – Early stopping rounds.

  • max_bin (int, default=256) – Maximum number of bins for feature discretization.

  • max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.

  • **kwargs – Arbitrary keyword arguments to be passed to the base class.

score(X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

fit(X, y, sample_weight=None, **fit_params) Self[source]

A wrapper for the base fit method.

class perpetual.sklearn.PerpetualRegressor(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'SquaredLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, **kwargs)[source]

Bases: PerpetualBooster, RegressorMixin

A scikit-learn compatible regressor based on PerpetualBooster. Uses ‘SquaredLoss’ as the default objective.

__init__(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'SquaredLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, **kwargs)[source]

Gradient Boosting Machine with Perpetual Learning.

A self-generalizing gradient boosting machine that doesn’t need hyperparameter optimization. It automatically finds the best configuration based on the provided budget.

Parameters:
  • objective (str or tuple, default="SquaredLoss") –

    Learning objective function to be used for optimization. Valid options are:

    • ”SquaredLoss”: squared error for regression.

    • ”QuantileLoss”: quantile error for quantile regression.

    • ”HuberLoss”: Huber loss for robust regression.

    • ”AdaptiveHuberLoss”: adaptive Huber loss for robust regression.

    • custom objective: a tuple of (loss, gradient, initial_value) functions. Each function should have the following signature:

      • loss(y, pred, weight, group) : returns the loss value for each sample.

      • gradient(y, pred, weight, group) : returns a tuple of (gradient, hessian). If the hessian is constant (e.g., 1.0 for SquaredLoss), return None to improve performance.

      • initial_value(y, weight, group) : returns the initial value for the booster.

  • budget (float, default=0.5) – A positive number for fitting budget. Increasing this number will more likely result in more boosting rounds and increased predictive power.

  • num_threads (int, optional) – Number of threads to be used during training and prediction.

  • monotone_constraints (dict, optional) – Constraints to enforce a specific relationship between features and target. Keys are feature indices or names, values are -1, 1, or 0.

  • force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.

  • missing (float, default=np.nan) – Value to consider as missing data.

  • allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.

  • create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).

  • terminate_missing_features (iterable, optional) – Features for which missing branches will always be terminated if create_missing_branch is True.

  • missing_node_treatment (str, default="None") – How to handle weights for missing nodes if create_missing_branch is True. Options: “None”, “AssignToParent”, “AverageLeafWeight”, “AverageNodeWeight”.

  • log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.

  • feature_importance_method (str, default="Gain") – Method for calculating feature importance. Options: “Gain”, “Weight”, “Cover”, “TotalGain”, “TotalCover”.

  • quantile (float, optional) – Target quantile for quantile regression (objective=”QuantileLoss”).

  • reset (bool, optional) – Whether to reset the model or continue training on subsequent calls to fit.

  • categorical_features (str or iterable, default="auto") – Feature indices or names to treat as categorical.

  • timeout (float, optional) – Time limit for fitting in seconds.

  • iteration_limit (int, optional) – Maximum number of boosting iterations.

  • memory_limit (float, optional) – Memory limit for training in GB.

  • stopping_rounds (int, optional) – Early stopping rounds.

  • max_bin (int, default=256) – Maximum number of bins for feature discretization.

  • max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.

  • **kwargs – Arbitrary keyword arguments to be passed to the base class.

fit(X, y, sample_weight=None, **fit_params) Self[source]

A wrapper for the base fit method.

score(X, y, sample_weight=None)[source]

Returns the coefficient of determination ($R^2$) of the prediction.

class perpetual.sklearn.PerpetualRanker(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'ListNetLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, **kwargs)[source]

Bases: PerpetualBooster, RegressorMixin

A scikit-learn compatible ranker based on PerpetualBooster. Uses ‘ListNetLoss’ as the default objective. Requires the ‘group’ parameter to be passed to fit.

__init__(*, objective: str | Tuple[LambdaType, LambdaType, LambdaType] = 'ListNetLoss', budget: float = 0.5, num_threads: int | None = None, monotone_constraints: Dict[Any, int] | None = None, max_bin: int = 256, max_cat: int = 1000, **kwargs)[source]

Gradient Boosting Machine with Perpetual Learning.

A self-generalizing gradient boosting machine that doesn’t need hyperparameter optimization. It automatically finds the best configuration based on the provided budget.

Parameters:
  • objective (str or tuple, default="ListNetLoss") –

    Learning objective function to be used for optimization. Valid options are:

    • ”ListNetLoss”: ListNet loss for ranking.

    • custom objective: a tuple of (loss, gradient, initial_value) functions. Each function should have the following signature:

      • loss(y, pred, weight, group) : returns the loss value for each sample.

      • gradient(y, pred, weight, group) : returns a tuple of (gradient, hessian). If the hessian is constant (e.g., 1.0 for SquaredLoss), return None to improve performance.

      • initial_value(y, weight, group) : returns the initial value for the booster.

  • budget (float, default=0.5) – A positive number for fitting budget. Increasing this number will more likely result in more boosting rounds and increased predictive power.

  • num_threads (int, optional) – Number of threads to be used during training and prediction.

  • monotone_constraints (dict, optional) – Constraints to enforce a specific relationship between features and target. Keys are feature indices or names, values are -1, 1, or 0.

  • force_children_to_bound_parent (bool, default=False) – Whether to restrict children nodes to be within the parent’s range.

  • missing (float, default=np.nan) – Value to consider as missing data.

  • allow_missing_splits (bool, default=True) – Whether to allow splits that separate missing from non-missing values.

  • create_missing_branch (bool, default=False) – Whether to create a separate branch for missing values (ternary trees).

  • terminate_missing_features (iterable, optional) – Features for which missing branches will always be terminated if create_missing_branch is True.

  • missing_node_treatment (str, default="None") – How to handle weights for missing nodes if create_missing_branch is True. Options: “None”, “AssignToParent”, “AverageLeafWeight”, “AverageNodeWeight”.

  • log_iterations (int, default=0) – Logging frequency (every N iterations). 0 disables logging.

  • feature_importance_method (str, default="Gain") – Method for calculating feature importance. Options: “Gain”, “Weight”, “Cover”, “TotalGain”, “TotalCover”.

  • quantile (float, optional) – Target quantile for quantile regression (objective=”QuantileLoss”).

  • reset (bool, optional) – Whether to reset the model or continue training on subsequent calls to fit.

  • categorical_features (str or iterable, default="auto") – Feature indices or names to treat as categorical.

  • timeout (float, optional) – Time limit for fitting in seconds.

  • iteration_limit (int, optional) – Maximum number of boosting iterations.

  • memory_limit (float, optional) – Memory limit for training in GB.

  • stopping_rounds (int, optional) – Early stopping rounds.

  • max_bin (int, default=256) – Maximum number of bins for feature discretization.

  • max_cat (int, default=1000) – Maximum unique categories before a feature is treated as numerical.

  • **kwargs – Arbitrary keyword arguments to be passed to the base class.

fit(X, y, group=None, sample_weight=None, **fit_params) Self[source]

Fit the ranker. Requires the ‘group’ parameter.

Parameters:
  • X – Training data.

  • y – Target relevance scores.

  • group – Group lengths to use for a ranking objective. (Required for ListNetLoss).

  • sample_weight – Instance weights.