Skip to main content
QUICK REVIEW

[Paper Review] Expectation Maximization and Complex Duration Distributions for Continuous Time Bayesian Networks

Uri Nodelman, Christian R. Shelton|arXiv (Cornell University)|Jul 4, 2012
Bayesian Modeling and Causal Inference17 references66 citations
TL;DR

This paper extends Continuous Time Bayesian Networks (CTBNs) by integrating Expectation Maximization (EM) and Structural EM (SEM) for learning from partially observed data, enabling the use of phase-type distributions—highly expressive semi-parametric models that approximate any duration distribution. The method significantly improves modeling flexibility and performance over traditional CTBNs and Dynamic Bayesian Networks (DBNs), especially in capturing complex duration patterns in real-world life span data.

ABSTRACT

Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. We address the problem of learning the parameters and structure of a CTBN from partially observed data. We show how to apply expectation maximization (EM) and structural expectation maximization (SEM) to CTBNs. The availability of the EM algorithm allows us to extend the representation of CTBNs to allow a much richer class of transition durations distributions, known as phase distributions. This class is a highly expressive semi-parametric representation, which can approximate any duration distribution arbitrarily closely. This extension to the CTBN framework addresses one of the main limitations of both CTBNs and DBNs - the restriction to exponentially / geometrically distributed duration. We present experimental results on a real data set of people's life spans, showing that our algorithm learns reasonable models - structure and parameters - from partially observed data, and, with the use of phase distributions, achieves better performance than DBNs.

Motivation & Objective

  • Address the limitation of CTBNs and DBNs in modeling non-exponential duration distributions.
  • Enable learning of CTBN structure and parameters from partially observed temporal data.
  • Introduce a flexible, semi-parametric representation of duration distributions using phase-type distributions.
  • Improve modeling accuracy and predictive performance on real-world continuous-time stochastic processes.
  • Demonstrate the effectiveness of EM and SEM algorithms in learning complex CTBNs with rich duration dynamics.

Proposed method

  • Adapt the Expectation Maximization (EM) algorithm to estimate CTBN parameters from partially observed data.
  • Extend the Structural EM (SEM) algorithm to learn the structure of CTBNs from incomplete temporal data.
  • Introduce phase-type distributions as a flexible, semi-parametric representation for modeling arbitrary duration distributions.
  • Model each variable’s transition intensity as a function of its parent variables, using phase-type distributions to capture complex memoryless and memoryful dynamics.
  • Use hidden semi-Markov processes as a foundation for phase-type modeling within the CTBN framework.
  • Apply the EM algorithm to iteratively improve parameter estimates by computing expected sufficient statistics over latent state trajectories.

Experimental results

Research questions

  • RQ1Can EM and SEM be effectively adapted to learn CTBNs from partially observed continuous-time data?
  • RQ2Can phase-type distributions significantly improve the modeling of duration distributions in CTBNs compared to exponential or geometric distributions?
  • RQ3Does the extended CTBN framework with phase-type durations outperform standard DBNs and CTBNs in terms of predictive accuracy on real data?
  • RQ4How well can the learning algorithm recover the true underlying structure and parameters of a CTBN from incomplete observations?
  • RQ5To what extent do phase-type distributions enable better approximation of complex, non-memoryless duration patterns in real-world processes?

Key findings

  • The EM and SEM algorithms successfully learn both structure and parameters of CTBNs from partially observed data, enabling robust model induction.
  • Phase-type distributions allow CTBNs to approximate any duration distribution arbitrarily closely, overcoming the exponential/constant hazard rate limitation.
  • The extended CTBN model with phase-type durations achieves significantly better performance than standard DBNs and CTBNs on a real life-span dataset.
  • The model learns meaningful and interpretable structure from incomplete temporal data, reflecting realistic dependencies in survival processes.
  • The use of phase-type distributions enables more accurate representation of complex, non-exponential duration patterns observed in longitudinal health data.
  • Empirical results demonstrate that the proposed method improves log-likelihood and predictive accuracy compared to baseline models using exponential durations.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.