Instrumental Variables (BoostIV)

Instrumental variables (IV) are used in causal inference to estimate causal effects when there is unobserved confounding between the treatment \(W\) and the outcome \(Y\).

Unobserved confounding violates the consistency of standard estimators (like Ordinary Least Squares or standard Gradient Boosting). An instrument \(Z\) is a variable that is correlated with the treatment but has no direct effect on the outcome except through the treatment.

Boosted Instrumental Variables

The iv.BraidedBooster implements a Control Function approach using Gradient Boosting. This method avoids the biased “Forbidden Regression” by explicitly modeling the first-stage residuals to account for endogeneity.

  1. Stage 1 (Treatment Model): Model the treatment \(W\) as a function of covariates \(X\) and instruments \(Z\): \(\hat{W} = f(X, Z)\). Then compute residuals \(V = W - \hat{W}\).

  2. Stage 2 (Outcome Model): Model the outcome \(Y\) as a function of covariates \(X\), predicted treatment \(\hat{W}\), and the residuals \(V\): \(\hat{Y} = g(X, \hat{W}, V)\).

Example:

from perpetual.iv import BraidedBooster

# X: covariates, Z: instruments, y: outcome, w: treatment
model = BraidedBooster(
    treatment_objective="SquaredLoss",
    outcome_objective="SquaredLoss",
    stage1_budget=0.5,
    stage2_budget=0.5
)

model.fit(X, Z, y, w)

# Predict outcome for a counterfactual treatment level
y_pred = model.predict(X_test, w_counterfactual=np.ones(len(X_test)))

Tutorials

For a detailed walkthrough using the Card (1995) education dataset, see the Instrumental Variables (Boosted IV).