[Paper Review] Adapting Neural Networks for the Estimation of Treatment Effects
Introduces Dragonnet, a three-headed neural network architecture that leverages the sufficiency of the propensity score for causal adjustment, and a targeted regularization method to improve downstream treatment-effect estimation from observational data; demonstrates strong results on IHDP and ACIC 2018 benchmarks.
This paper addresses the use of neural networks for the estimation of treatment effects from observational data. Generally, estimation proceeds in two stages. First, we fit models for the expected outcome and the probability of treatment (propensity score) for each unit. Second, we plug these fitted models into a downstream estimator of the effect. Neural networks are a natural choice for the models in the first step. The question we address is: how can we adapt the design and training of the neural networks used in the first step in order to improve the quality of the final estimate of the treatment effect? We propose two adaptations based on insights from the statistical literature on the estimation of treatment effects. The first is a new architecture, the Dragonnet, that exploits the sufficiency of the propensity score for estimation adjustment. The second is a regularization procedure, targeted regularization, that induces a bias towards models that have non-parametrically optimal asymptotic properties `out-of-the-box`. Studies on benchmark datasets for causal inference show these adaptations outperform existing methods. Code is available at github.com/claudiashi57/dragonnet.
Motivation & Objective
- Motivate causal effect estimation from observational data under no hidden confounding.
- Develop neural-network-based models for both the conditional outcome Q(t, x) and the propensity score g(x).
- Propose architectures and regularization techniques that improve downstream ATE estimation while managing finite-sample behavior.
Proposed method
- Propose Dragonnet, a three-headed neural network with a shared representation Z(X): one head predicts g(x) (propensity) and two heads predict Q(0, x) and Q(1, x).
- Train with an end-to-end objective that combines outcome prediction loss and propensity score prediction loss: R(θ) = (1/n) Σi [ (Qnn(ti, xi; θ) − yi)^2 + α CrossEntropy(gnn(xi; θ), ti) ].
- Introduce targeted regularization: augment Q with a perturbation tildeQ and add a squared error term γ to the loss, then optimize over ε to satisfy a non-parametric estimating equation.
- Ground the approach in non-parametric estimation theory (TMLE-inspired) to achieve robustness and efficiency for the ATE ψ.
- Compare end-to-end Dragonnet and Dragonnet with targeted regularization to multi-stage baselines (NEDnet) and TMLE-based methods.
Experimental results
Research questions
- RQ1Can end-to-end neural-network training for Q and g improve downstream ATE estimation compared to traditional multi-stage approaches?
- RQ2Does enforcing propensity-score sufficiency within a neural network (Dragonnet) yield better causal adjustment and ATE estimation, especially when many covariates are irrelevant to treatment assignment?
- RQ3Does targeted regularization provide finite-sample stability and asymptotic efficiency for ATE estimation in neural-network settings?
- RQ4How do these methods perform on established causal-benchmark datasets (IHDP, ACIC 2018) relative to existing neural-network baselines?
Key findings
- Dragonnet with targeted regularization achieves state-of-the-art estimation error on IHDP among neural-network methods.
- On ACIC 2018, Dragonnet and especially Dragonnet with targeted regularization outperform the baseline and TMLE in many settings.
- Dragonnet often wins when many covariates influence Y but not T, consistent with leveraging propensity score sufficiency to focus on treatment-relevant information.
- End-to-end Dragonnet performs better than a multi-stage NEDnet in estimating effects.
- TMLE can degrade performance in finite samples, whereas targeted regularization maintains or improves estimation under broader conditions.
- Adjusting only the information relevant to treatment assignment (via the shared representation) can improve estimation even if predictive outcomes worsen slightly.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.