Fairness-Aware Credit Scoring

Machine learning models used for lending, hiring, and insurance must comply with anti-discrimination regulations. Simply removing protected attributes (“fairness through unawareness”) is insufficient because other features can serve as proxies.

This tutorial demonstrates how to build a fair credit-risk model using Perpetual’s built-in constraints:

Baseline model — fit without any fairness considerations.
Monotonicity constraints — enforce logically consistent feature effects.
Interaction constraints — prevent the model from learning proxy interactions involving protected attributes.
Adverse-action reason codes — generate compliant explanations with PerpetualRiskEngine.
Fairness auditing — measure Demographic Parity and Equalized Odds across groups.

Dataset: The Adult Census Income dataset from the UCI repository, available on OpenML. The task is to predict whether an individual earns more than $50K/year. We treat this as a credit-scoring analogue (high income ≈ low risk).

[ ]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from perpetual import PerpetualBooster, PerpetualRiskEngine
from sklearn.datasets import fetch_openml
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import train_test_split

1. Load and Explore the Data

[ ]:

data = fetch_openml(data_id=1590, as_frame=True, parser="auto")
df = data.frame

# Binary target: >50K = 1, <=50K = 0
df["target"] = (df["class"] == ">50K").astype(int)

# Protected attribute
protected_col = "sex"

print(f"Samples: {len(df):,}")
print(f"\nTarget distribution:\n{df['target'].value_counts(normalize=True)}")
print(f"\nProtected attribute distribution:\n{df[protected_col].value_counts()}")
df.head()

[ ]:

# Features
feature_cols = [
    "age",
    "workclass",
    "education",
    "education-num",
    "marital-status",
    "occupation",
    "relationship",
    "race",
    "sex",
    "capital-gain",
    "capital-loss",
    "hours-per-week",
    "native-country",
]
X = df[feature_cols].copy()

# Mark categoricals
cat_cols = X.select_dtypes(include=["object", "category"]).columns.tolist()
for c in cat_cols:
    X[c] = X[c].astype("category")

y = df["target"].values
S = df[protected_col].values  # protected group labels

X_train, X_test, y_train, y_test, S_train, S_test = train_test_split(
    X, y, S, test_size=0.2, random_state=42, stratify=y
)
print(f"Train: {len(X_train):,}  |  Test: {len(X_test):,}")

2. Baseline Model (Unconstrained)

[ ]:

baseline = PerpetualBooster(objective="LogLoss", budget=0.5)
baseline.fit(X_train, y_train)

# PerpetualBooster.predict_proba may return shape (n_samples, 2) for binary classification.
# For roc_auc_score, use the probability for the positive class only.
probs_bl = baseline.predict_proba(X_test)
if probs_bl.ndim == 2 and probs_bl.shape[1] == 2:
    probs_bl = probs_bl[:, 1]
preds_bl = (probs_bl > 0.5).astype(int)

print(f"Baseline AUC:      {roc_auc_score(y_test, probs_bl):.4f}")
print(f"Baseline Accuracy: {accuracy_score(y_test, preds_bl):.4f}")

2.1 Fairness Audit — Baseline

We compute two standard fairness metrics:

Demographic Parity (DP): The difference in positive prediction rates between groups. DP = 0 means equal rates.
Equalized Odds (EO): The maximum difference in True Positive Rate or False Positive Rate between groups.

[ ]:

def fairness_report(y_true, y_pred, groups, group_name="Group"):
    """Print Demographic Parity and Equalized Odds metrics."""
    unique_groups = np.unique(groups)
    print(f"{'Group':<15} {'Pos Rate':>10} {'TPR':>10} {'FPR':>10} {'n':>8}")
    print("-" * 55)
    rates = {}
    for g in unique_groups:
        mask = groups == g
        pos_rate = y_pred[mask].mean()
        tp = ((y_pred[mask] == 1) & (y_true[mask] == 1)).sum()
        fn = ((y_pred[mask] == 0) & (y_true[mask] == 1)).sum()
        fp = ((y_pred[mask] == 1) & (y_true[mask] == 0)).sum()
        tn = ((y_pred[mask] == 0) & (y_true[mask] == 0)).sum()
        tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
        fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
        rates[g] = {"pos_rate": pos_rate, "tpr": tpr, "fpr": fpr}
        print(
            f"{str(g):<15} {pos_rate:>10.4f} {tpr:>10.4f} {fpr:>10.4f} {mask.sum():>8}"
        )

    pos_rates = [r["pos_rate"] for r in rates.values()]
    tprs = [r["tpr"] for r in rates.values()]
    fprs = [r["fpr"] for r in rates.values()]

    dp_gap = max(pos_rates) - min(pos_rates)
    eo_gap = max(max(tprs) - min(tprs), max(fprs) - min(fprs))

    print(f"\nDemographic Parity Gap: {dp_gap:.4f}")
    print(f"Equalized Odds Gap:    {eo_gap:.4f}")
    return dp_gap, eo_gap


print("=== Baseline Fairness ===")
dp_bl, eo_bl = fairness_report(y_test, preds_bl, S_test)

3. Constrained Model

We apply two types of constraints to improve fairness while retaining accuracy:

3.1 Monotonicity Constraints

We enforce that logically relevant features have the expected direction of effect:

education-num ↑ → income ↑ (positive)
hours-per-week ↑ → income ↑ (positive)
age ↑ → income ↑ (positive, on average)

3.2 Interaction Constraints

We prevent the model from creating splits that interact the protected attribute (sex, index 8) with other features. This limits proxy discrimination.

[ ]:

monotone = {
    "education-num": 1,  # More education → higher income
    "hours-per-week": 1,  # More hours → higher income
    "age": 1,  # Older → higher income (on average)
    "capital-gain": 1,  # More gains → higher income
}

# 'sex' is at feature index 8. We isolate it so the model
# cannot form interactions between sex and other features.
sex_idx = feature_cols.index("sex")
other_idx = [i for i in range(len(feature_cols)) if i != sex_idx]
interaction_constraints = [other_idx, [sex_idx]]

constrained = PerpetualBooster(
    objective="LogLoss",
    budget=0.5,
    monotone_constraints=monotone,
    interaction_constraints=interaction_constraints,
)
constrained.fit(X_train, y_train)

probs_cs = constrained.predict_proba(X_test)
if probs_cs.ndim == 2 and probs_cs.shape[1] == 2:
    probs_cs = probs_cs[:, 1]
preds_cs = (probs_cs > 0.5).astype(int)

print(f"Constrained AUC:      {roc_auc_score(y_test, probs_cs):.4f}")
print(f"Constrained Accuracy: {accuracy_score(y_test, preds_cs):.4f}")

[ ]:

print("=== Constrained Fairness ===")
dp_cs, eo_cs = fairness_report(y_test, preds_cs, S_test)

3.3 Comparison: Baseline vs. Constrained

[ ]:

comparison = pd.DataFrame(
    {
        "Model": ["Baseline", "Constrained"],
        "AUC": [
            roc_auc_score(y_test, probs_bl),
            roc_auc_score(y_test, probs_cs),
        ],
        "Accuracy": [
            accuracy_score(y_test, preds_bl),
            accuracy_score(y_test, preds_cs),
        ],
        "DP Gap": [dp_bl, dp_cs],
        "EO Gap": [eo_bl, eo_cs],
    }
)
print(comparison.to_string(index=False))

4. Adverse-Action Reason Codes

Regulations like ECOA (US) and GDPR (EU) require lenders to provide specific reasons when an application is denied. Perpetual’s PerpetualRiskEngine generates per-applicant explanations directly from tree structure — no post-hoc approximation needed.

[ ]:

engine = PerpetualRiskEngine(constrained)

# Find a "denied" applicant (high risk / low income prediction)
denied_idx = np.where(probs_cs < 0.3)[0][:3]  # first 3 denied

for idx in denied_idx:
    applicant = X_test.iloc[[idx]]
    print(f"\n--- Applicant {idx} (P(>50K) = {probs_cs[idx]:.3f}) ---")
    reasons = engine.generate_reason_codes(applicant, threshold=0.5)
    for reason_list in reasons:
        for i, r in enumerate(reason_list, 1):
            print(f"  Reason {i}: {r}")

5. Visualizing the Accuracy–Fairness Trade-off

By sweeping the classification threshold, we can trace out the frontier of accuracy vs. demographic parity gap.

[ ]:

thresholds = np.linspace(0.2, 0.8, 20)
curves = {"Baseline": probs_bl, "Constrained": probs_cs}

fig, ax = plt.subplots(figsize=(8, 5))
for label, probs in curves.items():
    accs, dps = [], []
    for t in thresholds:
        preds = (probs > t).astype(int)
        accs.append(accuracy_score(y_test, preds))
        groups_u = np.unique(S_test)
        pos_rates = [preds[S_test == g].mean() for g in groups_u]
        dps.append(max(pos_rates) - min(pos_rates))
    ax.plot(dps, accs, "o-", label=label, markersize=4)

ax.set_xlabel("Demographic Parity Gap (lower is fairer)")
ax.set_ylabel("Accuracy")
ax.set_title("Accuracy vs. Fairness Trade-off")
ax.legend()
plt.tight_layout()
plt.show()

Key Takeaways

Technique	Purpose
Monotonicity constraints	Ensure feature effects are logically consistent (e.g., more education → better outcome).
Interaction constraints	Prevent the model from using proxy interactions with protected attributes.
PerpetualRiskEngine	Generate compliant adverse-action reason codes directly from tree structure.
Fairness auditing	Measure Demographic Parity and Equalized Odds to quantify disparate impact.
Threshold tuning	Trade off accuracy against fairness by adjusting the classification threshold.