Skip to main content
QUICK REVIEW

[论文解读] Fast Gibbs Sampling on Bayesian Hidden Markov Model with Missing Observations

Dongrong Li, Tianwei Yu|arXiv (Cornell University)|Jan 4, 2026
Bayesian Methods and Mixture Models被引用 0
一句话总结

引入一种用于带缺失数据的HMM的Collapsed Gibbs采样器,通过解析边际化缺失观测及其潜在状态,实现更快的收敛和更低的每次迭代复杂度,尤其在高缺失率时效果显著。

ABSTRACT

The Hidden Markov Model (HMM) is a widely-used statistical model for handling sequential data. However, the presence of missing observations in real-world datasets often complicates the application of the model. The EM algorithm and Gibbs samplers can be used to estimate the model, yet suffering from various problems including non-convexity, high computational complexity and slow mixing. In this paper, we propose a collapsed Gibbs sampler that efficiently samples from HMMs' posterior by integrating out both the missing observations and the corresponding latent states. The proposed sampler is fast due to its three advantages. First, it achieves an estimation accuracy that is comparable to existing methods. Second, it can produce a larger Effective Sample Size (ESS) per iteration, which can be justified theoretically and numerically. Third, when the number of missing entries is large, the sampler has a significant smaller computational complexity per iteration compared to other methods, thus is faster computationally. In summary, the proposed sampling algorithm is fast both computationally and theoretically and is particularly advantageous when there are a lot of missing entries. Finally, empirical evaluations based on numerical simulations and real data analysis demonstrate that the proposed algorithm consistently outperforms existing algorithms in terms of time complexity and sampling efficiency (measured in ESS).

研究动机与目标

  • 在真实序列数据中说明估计带缺失观测的HMM的挑战。
  • 开发一个对缺失数据及相关潜在状态进行边际化的Collapsed Gibbs采样器,以提高效率。
  • 分析收敛性和计算复杂度,展示在高缺失率情形下的优势。
  • 通过仿真和真实数据提供实证证据,表明该方法在速度和有效样本量(ESS)方面优于现有方法。

提出的方法

  • 在可忽略缺失性假设下,为不完整序列的HMM进行贝叶斯估计。
  • 通过解析地积分出缺失观测及其潜在状态,推导Collapsed联合分布。
  • 提出针对 p(theta, y_o, z_o) 的Collapsed Gibbs采样器,并对观测状态序列进行前向-后向抽样。
  • 在Collapsed模型上使用前向-后向过程高效采样 z_o,时间复杂度为 O((1-p)NT)。
  • 在共轭先验可用时,应用 Metropolis-within-Gibbs 方案更新 A 和 pi,并对发射矩阵 B 进行Dirichlet采样。
  • 给出算法1(z_o 的前向-后向抽样)和算法2(带Incomplete Observations的HMM的Collapsed Gibbs采样)。

实验结果

研究问题

  • RQ1如何将HMM中的缺失观测解析地整合以提高采样效率?
  • RQ2缺失数据对 Gibbs采样器在HMM中的收敛速率有何影响?
  • RQ3在高缺失率情形下,Collapsed后验采样器是否能实现更低的每次迭代复杂度和更高的 ESS?
  • RQ4Collapsed Gibbs方法在不完整HMM中的预测和插补能力如何?

主要发现

  • Collapsed采样器因潜在状态空间减少而实现比传统Gibbs采样更快的收敛。
  • 前向-后向计算仅在观测状态索引上进行,时间复杂度为 O((1-p)NT)。
  • 在估计精度相当的前提下,提升了每次迭代的ESS。
  • 理论分析表明 Gap(F_c) ≥ Gap(F_m) ≥ Gap(F_g),表明Collapsed方法的收敛更快。
  • 在仿真和真实数据上的经验评估表明,在高缺失率情形下,时间效率和采样性能均有所提升。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。