Skip to main content
QUICK REVIEW

[Paper Review] Bayesian structure learning using dynamic programming and MCMC

Daniel Eaton, Kevin J. Murphy|arXiv (Cornell University)|Jun 20, 2012
Bayesian Modeling and Causal Inference30 references84 citations
TL;DR

This paper proposes a hybrid Bayesian structure learning method that uses dynamic programming (DP) as a proposal distribution within Markov Chain Monte Carlo (MCMC) sampling over DAGs. By leveraging DP's ability to compute exact edge posterior probabilities, the approach overcomes MCMC's poor mixing while retaining flexibility for non-modular priors and predictive density estimation, resulting in faster convergence and improved predictive accuracy on test data.

ABSTRACT

MCMC methods for sampling from the space of DAGs can mix poorly due to the local nature of the proposals that are commonly used. It has been shown that sampling from the space of node orders yields better results [FK03, EW06]. Recently, Koivisto and Sood showed how one can analytically marginalize over orders using dynamic programming (DP) [KS04, Koi06]. Their method computes the exact marginal posterior edge probabilities, thus avoiding the need for MCMC. Unfortunately, there are four drawbacks to the DP technique: it can only use modular priors, it can only compute posteriors over modular features, it is difficult to compute a predictive density, and it takes exponential time and space. We show how to overcome the first three of these problems by using the DP algorithm as a proposal distribution for MCMC in DAG space. We show that this hybrid technique converges to the posterior faster than other methods, resulting in more accurate structure learning and higher predictive likelihoods on test data.

Motivation & Objective

  • To address the poor mixing of standard MCMC methods in DAG space due to local, low-probability proposals.
  • To overcome the limitations of dynamic programming (DP) in Bayesian structure learning, such as restricted prior types and inability to compute predictive densities.
  • To combine the accuracy of DP with the flexibility of MCMC for non-modular priors and feature types.
  • To improve convergence speed and predictive performance in Bayesian network structure learning.
  • To enable exact computation of edge posterior probabilities while supporting general priors and predictive inference.

Proposed method

  • Uses dynamic programming (DP) to compute exact posterior edge probabilities over all possible DAGs, serving as a proposal distribution in MCMC.
  • Integrates DP as a proposal mechanism within a Metropolis-Hastings MCMC sampler over the space of DAGs.
  • Leverages the DP algorithm's ability to marginalize over node orders analytically, avoiding the need for MCMC over orders.
  • Maintains full flexibility in prior specification (non-modular) by using DP only as a proposal, not for full posterior computation.
  • Enables computation of predictive densities by retaining the MCMC framework, which is not possible with pure DP.
  • Employs a hybrid sampling strategy where DP proposals guide MCMC moves, improving mixing and convergence.

Experimental results

Research questions

  • RQ1Can dynamic programming be used as an effective proposal mechanism in MCMC for Bayesian structure learning in DAGs?
  • RQ2Does combining DP with MCMC improve convergence speed and mixing compared to standard MCMC over DAGs?
  • RQ3Can the hybrid method support non-modular priors while retaining the accuracy of DP for edge posterior estimation?
  • RQ4Does the hybrid approach yield higher predictive likelihoods on test data than pure DP or standard MCMC?
  • RQ5Is it feasible to compute predictive densities in a Bayesian structure learning framework that uses DP for posterior approximation?

Key findings

  • The hybrid DP-MCMC method converges significantly faster to the true posterior distribution than standard MCMC over DAGs.
  • The method achieves higher predictive likelihoods on test data compared to both standard MCMC and pure DP approaches.
  • By using DP as a proposal, the method overcomes the limitation of DP in handling only modular priors, enabling use of general priors.
  • The approach supports the computation of predictive densities, which is not feasible with pure DP due to its marginalization over orders.
  • The method maintains the exactness of edge posterior probabilities from DP while gaining flexibility through MCMC sampling.
  • Empirical results show improved structure learning accuracy due to better exploration of the DAG space via informed proposals.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.