Fairness-Aware Credit Scoring
Machine learning models used for lending, hiring, and insurance must comply with anti-discrimination regulations. Simply removing protected attributes (“fairness through unawareness”) is insufficient because other features can serve as proxies.
This tutorial demonstrates how to build a fair credit-risk model using Perpetual’s built-in constraints:
Baseline model — fit without any fairness considerations.
Monotonicity constraints — enforce logically consistent feature effects.
Interaction constraints — prevent the model from learning proxy interactions involving protected attributes.
Adverse-action reason codes — generate compliant explanations with
PerpetualRiskEngine.Fairness auditing — measure Demographic Parity and Equalized Odds across groups.
Dataset: The Adult Census Income dataset from the UCI repository, available on OpenML. The task is to predict whether an individual earns more than $50K/year. We treat this as a credit-scoring analogue (high income ≈ low risk).
[ ]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from perpetual import PerpetualBooster, PerpetualRiskEngine
from sklearn.datasets import fetch_openml
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import train_test_split
1. Load and Explore the Data
[ ]:
data = fetch_openml(data_id=1590, as_frame=True, parser="auto")
df = data.frame
# Binary target: >50K = 1, <=50K = 0
df["target"] = (df["class"] == ">50K").astype(int)
# Protected attribute
protected_col = "sex"
print(f"Samples: {len(df):,}")
print(f"\nTarget distribution:\n{df['target'].value_counts(normalize=True)}")
print(f"\nProtected attribute distribution:\n{df[protected_col].value_counts()}")
df.head()
[ ]:
# Features
feature_cols = [
"age",
"workclass",
"education",
"education-num",
"marital-status",
"occupation",
"relationship",
"race",
"sex",
"capital-gain",
"capital-loss",
"hours-per-week",
"native-country",
]
X = df[feature_cols].copy()
# Mark categoricals
cat_cols = X.select_dtypes(include=["object", "category"]).columns.tolist()
for c in cat_cols:
X[c] = X[c].astype("category")
y = df["target"].values
S = df[protected_col].values # protected group labels
X_train, X_test, y_train, y_test, S_train, S_test = train_test_split(
X, y, S, test_size=0.2, random_state=42, stratify=y
)
print(f"Train: {len(X_train):,} | Test: {len(X_test):,}")
2. Baseline Model (Unconstrained)
[ ]:
baseline = PerpetualBooster(objective="LogLoss", budget=0.5)
baseline.fit(X_train, y_train)
probs_bl = baseline.predict_proba(X_test)
preds_bl = (probs_bl > 0.5).astype(int)
print(f"Baseline AUC: {roc_auc_score(y_test, probs_bl):.4f}")
print(f"Baseline Accuracy: {accuracy_score(y_test, preds_bl):.4f}")
2.1 Fairness Audit — Baseline
We compute two standard fairness metrics:
Demographic Parity (DP): The difference in positive prediction rates between groups. DP = 0 means equal rates.
Equalized Odds (EO): The maximum difference in True Positive Rate or False Positive Rate between groups.
[ ]:
def fairness_report(y_true, y_pred, groups, group_name="Group"):
"""Print Demographic Parity and Equalized Odds metrics."""
unique_groups = np.unique(groups)
print(f"{'Group':<15} {'Pos Rate':>10} {'TPR':>10} {'FPR':>10} {'n':>8}")
print("-" * 55)
rates = {}
for g in unique_groups:
mask = groups == g
pos_rate = y_pred[mask].mean()
tp = ((y_pred[mask] == 1) & (y_true[mask] == 1)).sum()
fn = ((y_pred[mask] == 0) & (y_true[mask] == 1)).sum()
fp = ((y_pred[mask] == 1) & (y_true[mask] == 0)).sum()
tn = ((y_pred[mask] == 0) & (y_true[mask] == 0)).sum()
tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
rates[g] = {"pos_rate": pos_rate, "tpr": tpr, "fpr": fpr}
print(
f"{str(g):<15} {pos_rate:>10.4f} {tpr:>10.4f} {fpr:>10.4f} {mask.sum():>8}"
)
pos_rates = [r["pos_rate"] for r in rates.values()]
tprs = [r["tpr"] for r in rates.values()]
fprs = [r["fpr"] for r in rates.values()]
dp_gap = max(pos_rates) - min(pos_rates)
eo_gap = max(max(tprs) - min(tprs), max(fprs) - min(fprs))
print(f"\nDemographic Parity Gap: {dp_gap:.4f}")
print(f"Equalized Odds Gap: {eo_gap:.4f}")
return dp_gap, eo_gap
print("=== Baseline Fairness ===")
dp_bl, eo_bl = fairness_report(y_test, preds_bl, S_test)
3. Constrained Model
We apply two types of constraints to improve fairness while retaining accuracy:
3.1 Monotonicity Constraints
We enforce that logically relevant features have the expected direction of effect:
education-num↑ → income ↑ (positive)hours-per-week↑ → income ↑ (positive)age↑ → income ↑ (positive, on average)
3.2 Interaction Constraints
We prevent the model from creating splits that interact the protected attribute (sex, index 8) with other features. This limits proxy discrimination.
[ ]:
monotone = {
"education-num": 1, # More education → higher income
"hours-per-week": 1, # More hours → higher income
"age": 1, # Older → higher income (on average)
"capital-gain": 1, # More gains → higher income
}
# 'sex' is at feature index 8. We isolate it so the model
# cannot form interactions between sex and other features.
sex_idx = feature_cols.index("sex")
other_idx = [i for i in range(len(feature_cols)) if i != sex_idx]
interaction_constraints = [other_idx, [sex_idx]]
constrained = PerpetualBooster(
objective="LogLoss",
budget=0.5,
monotone_constraints=monotone,
interaction_constraints=interaction_constraints,
)
constrained.fit(X_train, y_train)
probs_cs = constrained.predict_proba(X_test)
preds_cs = (probs_cs > 0.5).astype(int)
print(f"Constrained AUC: {roc_auc_score(y_test, probs_cs):.4f}")
print(f"Constrained Accuracy: {accuracy_score(y_test, preds_cs):.4f}")
[ ]:
print("=== Constrained Fairness ===")
dp_cs, eo_cs = fairness_report(y_test, preds_cs, S_test)
3.3 Comparison: Baseline vs. Constrained
[ ]:
comparison = pd.DataFrame(
{
"Model": ["Baseline", "Constrained"],
"AUC": [
roc_auc_score(y_test, probs_bl),
roc_auc_score(y_test, probs_cs),
],
"Accuracy": [
accuracy_score(y_test, preds_bl),
accuracy_score(y_test, preds_cs),
],
"DP Gap": [dp_bl, dp_cs],
"EO Gap": [eo_bl, eo_cs],
}
)
print(comparison.to_string(index=False))
4. Adverse-Action Reason Codes
Regulations like ECOA (US) and GDPR (EU) require lenders to provide specific reasons when an application is denied. Perpetual’s PerpetualRiskEngine generates per-applicant explanations directly from tree structure — no post-hoc approximation needed.
[ ]:
engine = PerpetualRiskEngine(constrained)
# Find a "denied" applicant (high risk / low income prediction)
denied_idx = np.where(probs_cs < 0.3)[0][:3] # first 3 denied
for idx in denied_idx:
applicant = X_test.iloc[[idx]]
print(f"\n--- Applicant {idx} (P(>50K) = {probs_cs[idx]:.3f}) ---")
reasons = engine.generate_reason_codes(applicant, threshold=0.5)
for reason_list in reasons:
for i, r in enumerate(reason_list, 1):
print(f" Reason {i}: {r}")
5. Visualizing the Accuracy–Fairness Trade-off
By sweeping the classification threshold, we can trace out the frontier of accuracy vs. demographic parity gap.
[ ]:
thresholds = np.linspace(0.2, 0.8, 20)
curves = {"Baseline": probs_bl, "Constrained": probs_cs}
fig, ax = plt.subplots(figsize=(8, 5))
for label, probs in curves.items():
accs, dps = [], []
for t in thresholds:
preds = (probs > t).astype(int)
accs.append(accuracy_score(y_test, preds))
groups_u = np.unique(S_test)
pos_rates = [preds[S_test == g].mean() for g in groups_u]
dps.append(max(pos_rates) - min(pos_rates))
ax.plot(dps, accs, "o-", label=label, markersize=4)
ax.set_xlabel("Demographic Parity Gap (lower is fairer)")
ax.set_ylabel("Accuracy")
ax.set_title("Accuracy vs. Fairness Trade-off")
ax.legend()
plt.tight_layout()
plt.show()
Key Takeaways
Technique |
Purpose |
|---|---|
Monotonicity constraints |
Ensure feature effects are logically consistent (e.g., more education → better outcome). |
Interaction constraints |
Prevent the model from using proxy interactions with protected attributes. |
PerpetualRiskEngine |
Generate compliant adverse-action reason codes directly from tree structure. |
Fairness auditing |
Measure Demographic Parity and Equalized Odds to quantify disparate impact. |
Threshold tuning |
Trade off accuracy against fairness by adjusting the classification threshold. |