Fairness-Aware Credit Scoring

Machine learning models used for lending, hiring, and insurance must comply with anti-discrimination regulations. Simply removing protected attributes (“fairness through unawareness”) is insufficient because other features can serve as proxies.

This tutorial demonstrates how to build a fair credit-risk model using Perpetual’s built-in constraints:

  1. Baseline model — fit without any fairness considerations.

  2. Monotonicity constraints — enforce logically consistent feature effects.

  3. Interaction constraints — prevent the model from learning proxy interactions involving protected attributes.

  4. Adverse-action reason codes — generate compliant explanations with PerpetualRiskEngine.

  5. Fairness auditing — measure Demographic Parity and Equalized Odds across groups.

Dataset: The Adult Census Income dataset from the UCI repository, available on OpenML. The task is to predict whether an individual earns more than $50K/year. We treat this as a credit-scoring analogue (high income ≈ low risk).

[ ]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from perpetual import PerpetualBooster, PerpetualRiskEngine
from sklearn.datasets import fetch_openml
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import train_test_split

1. Load and Explore the Data

[ ]:
data = fetch_openml(data_id=1590, as_frame=True, parser="auto")
df = data.frame

# Binary target: >50K = 1, <=50K = 0
df["target"] = (df["class"] == ">50K").astype(int)

# Protected attribute
protected_col = "sex"

print(f"Samples: {len(df):,}")
print(f"\nTarget distribution:\n{df['target'].value_counts(normalize=True)}")
print(f"\nProtected attribute distribution:\n{df[protected_col].value_counts()}")
df.head()
[ ]:
# Features
feature_cols = [
    "age",
    "workclass",
    "education",
    "education-num",
    "marital-status",
    "occupation",
    "relationship",
    "race",
    "sex",
    "capital-gain",
    "capital-loss",
    "hours-per-week",
    "native-country",
]
X = df[feature_cols].copy()

# Mark categoricals
cat_cols = X.select_dtypes(include=["object", "category"]).columns.tolist()
for c in cat_cols:
    X[c] = X[c].astype("category")

y = df["target"].values
S = df[protected_col].values  # protected group labels

X_train, X_test, y_train, y_test, S_train, S_test = train_test_split(
    X, y, S, test_size=0.2, random_state=42, stratify=y
)
print(f"Train: {len(X_train):,}  |  Test: {len(X_test):,}")

2. Baseline Model (Unconstrained)

[ ]:
baseline = PerpetualBooster(objective="LogLoss", budget=0.5)
baseline.fit(X_train, y_train)

probs_bl = baseline.predict_proba(X_test)
preds_bl = (probs_bl > 0.5).astype(int)

print(f"Baseline AUC:      {roc_auc_score(y_test, probs_bl):.4f}")
print(f"Baseline Accuracy: {accuracy_score(y_test, preds_bl):.4f}")

2.1 Fairness Audit — Baseline

We compute two standard fairness metrics:

  • Demographic Parity (DP): The difference in positive prediction rates between groups. DP = 0 means equal rates.

  • Equalized Odds (EO): The maximum difference in True Positive Rate or False Positive Rate between groups.

[ ]:
def fairness_report(y_true, y_pred, groups, group_name="Group"):
    """Print Demographic Parity and Equalized Odds metrics."""
    unique_groups = np.unique(groups)
    print(f"{'Group':<15} {'Pos Rate':>10} {'TPR':>10} {'FPR':>10} {'n':>8}")
    print("-" * 55)
    rates = {}
    for g in unique_groups:
        mask = groups == g
        pos_rate = y_pred[mask].mean()
        tp = ((y_pred[mask] == 1) & (y_true[mask] == 1)).sum()
        fn = ((y_pred[mask] == 0) & (y_true[mask] == 1)).sum()
        fp = ((y_pred[mask] == 1) & (y_true[mask] == 0)).sum()
        tn = ((y_pred[mask] == 0) & (y_true[mask] == 0)).sum()
        tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
        fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
        rates[g] = {"pos_rate": pos_rate, "tpr": tpr, "fpr": fpr}
        print(
            f"{str(g):<15} {pos_rate:>10.4f} {tpr:>10.4f} {fpr:>10.4f} {mask.sum():>8}"
        )

    pos_rates = [r["pos_rate"] for r in rates.values()]
    tprs = [r["tpr"] for r in rates.values()]
    fprs = [r["fpr"] for r in rates.values()]

    dp_gap = max(pos_rates) - min(pos_rates)
    eo_gap = max(max(tprs) - min(tprs), max(fprs) - min(fprs))

    print(f"\nDemographic Parity Gap: {dp_gap:.4f}")
    print(f"Equalized Odds Gap:    {eo_gap:.4f}")
    return dp_gap, eo_gap


print("=== Baseline Fairness ===")
dp_bl, eo_bl = fairness_report(y_test, preds_bl, S_test)

3. Constrained Model

We apply two types of constraints to improve fairness while retaining accuracy:

3.1 Monotonicity Constraints

We enforce that logically relevant features have the expected direction of effect:

  • education-num ↑ → income ↑ (positive)

  • hours-per-week ↑ → income ↑ (positive)

  • age ↑ → income ↑ (positive, on average)

3.2 Interaction Constraints

We prevent the model from creating splits that interact the protected attribute (sex, index 8) with other features. This limits proxy discrimination.

[ ]:
monotone = {
    "education-num": 1,  # More education → higher income
    "hours-per-week": 1,  # More hours → higher income
    "age": 1,  # Older → higher income (on average)
    "capital-gain": 1,  # More gains → higher income
}

# 'sex' is at feature index 8. We isolate it so the model
# cannot form interactions between sex and other features.
sex_idx = feature_cols.index("sex")
other_idx = [i for i in range(len(feature_cols)) if i != sex_idx]
interaction_constraints = [other_idx, [sex_idx]]

constrained = PerpetualBooster(
    objective="LogLoss",
    budget=0.5,
    monotone_constraints=monotone,
    interaction_constraints=interaction_constraints,
)
constrained.fit(X_train, y_train)

probs_cs = constrained.predict_proba(X_test)
preds_cs = (probs_cs > 0.5).astype(int)

print(f"Constrained AUC:      {roc_auc_score(y_test, probs_cs):.4f}")
print(f"Constrained Accuracy: {accuracy_score(y_test, preds_cs):.4f}")
[ ]:
print("=== Constrained Fairness ===")
dp_cs, eo_cs = fairness_report(y_test, preds_cs, S_test)

3.3 Comparison: Baseline vs. Constrained

[ ]:
comparison = pd.DataFrame(
    {
        "Model": ["Baseline", "Constrained"],
        "AUC": [
            roc_auc_score(y_test, probs_bl),
            roc_auc_score(y_test, probs_cs),
        ],
        "Accuracy": [
            accuracy_score(y_test, preds_bl),
            accuracy_score(y_test, preds_cs),
        ],
        "DP Gap": [dp_bl, dp_cs],
        "EO Gap": [eo_bl, eo_cs],
    }
)
print(comparison.to_string(index=False))

4. Adverse-Action Reason Codes

Regulations like ECOA (US) and GDPR (EU) require lenders to provide specific reasons when an application is denied. Perpetual’s PerpetualRiskEngine generates per-applicant explanations directly from tree structure — no post-hoc approximation needed.

[ ]:
engine = PerpetualRiskEngine(constrained)

# Find a "denied" applicant (high risk / low income prediction)
denied_idx = np.where(probs_cs < 0.3)[0][:3]  # first 3 denied

for idx in denied_idx:
    applicant = X_test.iloc[[idx]]
    print(f"\n--- Applicant {idx} (P(>50K) = {probs_cs[idx]:.3f}) ---")
    reasons = engine.generate_reason_codes(applicant, threshold=0.5)
    for reason_list in reasons:
        for i, r in enumerate(reason_list, 1):
            print(f"  Reason {i}: {r}")

5. Visualizing the Accuracy–Fairness Trade-off

By sweeping the classification threshold, we can trace out the frontier of accuracy vs. demographic parity gap.

[ ]:
thresholds = np.linspace(0.2, 0.8, 20)
curves = {"Baseline": probs_bl, "Constrained": probs_cs}

fig, ax = plt.subplots(figsize=(8, 5))
for label, probs in curves.items():
    accs, dps = [], []
    for t in thresholds:
        preds = (probs > t).astype(int)
        accs.append(accuracy_score(y_test, preds))
        groups_u = np.unique(S_test)
        pos_rates = [preds[S_test == g].mean() for g in groups_u]
        dps.append(max(pos_rates) - min(pos_rates))
    ax.plot(dps, accs, "o-", label=label, markersize=4)

ax.set_xlabel("Demographic Parity Gap (lower is fairer)")
ax.set_ylabel("Accuracy")
ax.set_title("Accuracy vs. Fairness Trade-off")
ax.legend()
plt.tight_layout()
plt.show()

Key Takeaways

Technique

Purpose

Monotonicity constraints

Ensure feature effects are logically consistent (e.g., more education → better outcome).

Interaction constraints

Prevent the model from using proxy interactions with protected attributes.

PerpetualRiskEngine

Generate compliant adverse-action reason codes directly from tree structure.

Fairness auditing

Measure Demographic Parity and Equalized Odds to quantify disparate impact.

Threshold tuning

Trade off accuracy against fairness by adjusting the classification threshold.