QUICK REVIEW

[论文解读] Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors

Yuling Yao, Aki Vehtari|arXiv (Cornell University)|Jun 22, 2020

Gaussian Processes and Bayesian Inference参考文献 95被引用 34

一句话总结

本文提出贝叶斯堆叠方法，将并行的、非混合的推断（MCMC、变分法，或基于模式的推断）进行组合，以更好地表示多模态后验并提高预测性能，特别是在模型指定错误的情况下。

ABSTRACT

When working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights in the posterior. Here we propose an approach using parallel runs of MCMC, variational, or mode-based inference to hit as many modes or separated regions as possible and then combine these using Bayesian stacking, a scalable method for constructing a weighted average of distributions. The result from stacking efficiently samples from multimodal posterior distribution, minimizes cross validation prediction error, and represents the posterior uncertainty better than variational inference, but it is not necessarily equivalent, even asymptotically, to fully Bayesian inference. We present theoretical consistency with an example where the stacked inference approximates the true data generating process from the misspecified model and a non-mixing sampler, from which the predictive performance is better than full Bayesian inference, hence the multimodality can be considered a blessing rather than a curse under model misspecification. We demonstrate practical implementation in several model families: latent Dirichlet allocation, Gaussian process regression, hierarchical regression, horseshoe variable selection, and neural networks.

研究动机与目标

激发贝叶斯计算中多模态或代谢稳态后验的推断难点。
提出将非混合链并行可扩展地组合起来以提高预测性能的堆叠方法。
将堆叠扩展到组合拟合同一模型的多条链，并提供实用的实现细节。
分析渐近行为，表明在模型误指定下堆叠可以超越完全贝叶斯推断。
在多样化模型上展示该方法以说明其实践有效性。

提出的方法

从分散的起点进行多次并行推断，以探索多条模态，而不依赖链间混合。
将每次运行聚类或视为一个独立的密度 p_k(θ|y) 以待组合。
使用 Pareto 平滑的重要性抽样（PSIS）估计每次运行的留一预测密度 p_k(y_i|y_-i)。
求解一个简单形约束优化，找到权重 w，使加权混合的 loo lpd（留一交叉预测对数密度）最大化。
在权重上引入 Dirichlet 型正则化，以稳定估计并在链之间部分池化权重。
将最优权重代入加权蒙特卡洛形式，以近似目标的多模态后验。
提供实现的实际步骤，包括通过 lpd 监控收敛性以及可选的链聚类。

实验结果

研究问题

RQ1将非混合、并行推断的堆叠是否能够在预测性能上超过任意单条链或天真平均？
RQ2应如何对多条非混合链进行加权以最好地表示用于预测的多模态后验？
RQ3在模型指定错误的情况下，堆叠预测是否可能超过精确贝叶斯后验？
RQ4当链不混合时，如何高效估计留一预测密度？
RQ5在不同模型族和计算设置下实施堆叠的实际指南有哪些？

主要发现

堆叠提供非混合链的加权组合，能够在预测性能上优于均匀或单链估计。
一种基于重要性采样的高效方法（PSIS）可以从每条链的全数据拟合近似留一预测密度。
堆叠权重最大化交叉验证预测准确性，产生的后验表示在整体上保持多模态性，但在预测方面更校准。
在模型误指定下，在某些理论情形中，堆叠链推断在预测方面可以优于精确后验。
该方法在若干模型族中得到演示，展示了对潜在Dirichlet分配、高斯过程回归、分层回归、马蹄铁变量选择和神经网络的实际适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。