[Paper Review] Causal Discovery from a Mixture of Experimental and Observational Data
This paper proposes a Bayesian method for causal discovery from mixed experimental and observational data, integrating both data types to learn causal Bayesian networks. By leveraging interventions (experimental data) and passive observations, the method improves accuracy in reconstructing causal structures and estimating parameters, as demonstrated on the ALARM network with systematic variations in data proportions, showing significant gains in structure recovery and parameter estimation.
This paper describes a Bayesian method for combining an arbitrary mixture of observational and experimental data in order to learn causal Bayesian networks. Observational data are passively observed. Experimental data, such as that produced by randomized controlled trials, result from the experimenter manipulating one or more variables (typically randomly) and observing the states of other variables. The paper presents a Bayesian method for learning the causal structure and parameters of the underlying causal process that is generating the data, given that (1) the data contains a mixture of observational and experimental case records, and (2) the causal process is modeled as a causal Bayesian network. This learning method was applied using as input various mixtures of experimental and observational data that were generated from the ALARM causal Bayesian network. In these experiments, the absolute and relative quantities of experimental and observational data were varied systematically. For each of these training datasets, the learning method was applied to predict the causal structure and to estimate the causal parameters that exist among randomly selected pairs of nodes in ALARM that are not confounded. The paper reports how these structure predictions and parameter estimates compare with the true causal structures and parameters as given by the ALARM network.
Motivation & Objective
- To develop a method that combines experimental and observational data for improved causal discovery in Bayesian networks.
- To address the challenge of learning causal structures when data sources are heterogeneous, including both intervention-based and passive-observation records.
- To evaluate how varying proportions of experimental versus observational data affect the accuracy of causal structure and parameter learning.
- To demonstrate the method's effectiveness on a benchmark causal network (ALARM) under controlled data mixtures.
Proposed method
- The method employs a Bayesian framework to jointly learn causal structure and parameters from a mixture of observational and experimental data.
- It models the data-generating process as a causal Bayesian network, incorporating both passive observations and intervention-driven data.
- The approach uses conditional probability distributions to represent causal relationships, with interventions explicitly modeled as do-operations.
- The learning algorithm computes posterior distributions over possible causal structures and parameters using Bayes' theorem, integrating evidence from both data types.
- The method accounts for confounding by focusing on non-confounded node pairs in the ALARM network during evaluation.
- Systematic experiments vary the relative quantities of experimental and observational data to assess performance across data mixtures.
Experimental results
Research questions
- RQ1How does combining experimental and observational data improve causal structure discovery compared to using either data type alone?
- RQ2What is the impact of varying the proportion of experimental data on the accuracy of causal parameter estimation?
- RQ3Can the method reliably recover the true causal structure of a known network (ALARM) when given mixed data?
- RQ4How does the presence of interventions affect the posterior probability of correct causal structures in the Bayesian learning process?
Key findings
- The method significantly improves causal structure recovery when experimental data are included, even in small proportions.
- Parameter estimates were more accurate when experimental data were incorporated, especially for direct causal effects.
- The method achieved high accuracy in identifying true causal relationships among non-confounded node pairs in the ALARM network.
- Performance improved monotonically with increasing proportions of experimental data, demonstrating the value of interventions.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.