Double Machine Learning (DML)
Double Machine Learning (DML) is a method for estimating causal effects when there are many confounding variables. It uses machine learning models to separately estimate the outcome and the treatment assignment, and then combines them using a Neyman-orthogonal score to obtain unbiased estimates of the treatment effect.
DMLEstimator
The dml.DMLEstimator allows estimating the Conditional Average Treatment Effect (CATE) for both discrete and continuous treatments using Gradient Boosting.
Example:
from perpetual.dml import DMLEstimator
import numpy as np
# X: covariates, w: treatment, y: outcome
# DMLEstimator uses separate cross-fitted models for the outcome (y ~ X) and the treatment (w ~ X)
model = DMLEstimator(
budget=0.5,
n_folds=2,
objective="SquaredLoss"
)
model.fit(X, w, y)
# Predict the Conditional Average Treatment Effect (CATE)
cate_pred = model.predict(X_test)
Tutorials
For a detailed walkthrough, see the Double Machine Learning: Estimating the Gender Wage Gap.