QUICK REVIEW

[论文解读] Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data

Aki Vehtari, Andrew Gelman|arXiv (Cornell University)|Dec 16, 2014

Gaussian Processes and Bayesian Inference参考文献 63被引用 63

一句话总结

本文提出了一种基于期望传播（EP）的分布式贝叶斯推断框架，通过结合分区数据的后验近似，实现并行计算，同时通过空 cavity 分布和倾斜分布保持正则化。实验表明，该方法实现了显著的加速效果——在30个计算节点下，计算时间最高可减少96%，且精度与现有共识蒙特卡洛方法相当或更优。

ABSTRACT

A common divide-and-conquer approach for Bayesian computation with big data is to partition the data, perform local inference for each piece separately, and combine the results to obtain a global posterior approximation. While being conceptually and computationally appealing, this method involves the problematic need to also split the prior for the local inferences; these weakened priors may not provide enough regularization for each separate computation, thus eliminating one of the key advantages of Bayesian methods. To resolve this dilemma while still retaining the generalizability of the underlying local inference method, we apply the idea of expectation propagation (EP) as a framework for distributed Bayesian inference. The central idea is to iteratively update approximations to the local likelihoods given the state of the other approximations and the prior. The present paper has two roles: we review the steps that are needed to keep EP algorithms numerically stable, and we suggest a general approach, inspired by EP, for approaching data partitioning problems in a way that achieves the computational benefits of parallelism while allowing each local update to make use of relevant information from the other sites. In addition, we demonstrate how the method can be applied in a hierarchical context to make use of partitioning of both data and parameters. The paper describes a general algorithmic framework, rather than a specific algorithm, and presents an example implementation for it.

研究动机与目标

为在大规模或分布式数据集上执行贝叶斯推断，通过数据分区和组合局部推断来应对挑战。
解决分而治之贝叶斯推断中的先验正则化困境，即分割先验会削弱其影响。
开发一种通用且数值稳定的分布式贝叶斯推断框架，灵感源自期望传播的消息传递机制。
在层次模型和隐私保护场景中实现高效推断，其中数据或模型组件分布在多个源上。
证明基于EP的分布式推断在计算效率和近似精度方面优于现有的共识蒙特卡洛方法。

提出的方法

利用期望传播的消息传递框架，通过其他分区的空 cavity 分布迭代更新局部后验近似。
应用倾斜分布，基于全局后验估计改进局部似然近似，实现在各站点间的信息共享。
采用全局后验服务器维护并更新全局近似，将更新分发至本地站点以实现迭代优化。
支持多种局部推断方法，包括矩匹配和SNEP，以实现灵活高效的局部计算。
采用数值稳定技术，确保EP算法在高维或复杂似然情况下的鲁棒性。
通过同时分区数据和参数，支持层次建模，实现跨多个源的元分析与分布式学习。

实验结果

研究问题

RQ1如何在不损失先验正则化效果的前提下，通过数据分区实现贝叶斯推断在大数据上的高效扩展？
RQ2期望传播的消息传递机制能否推广至多数据分区的分布式推断，同时保持精度？
RQ3在将EP应用于分布式贝叶斯推断时，面临的关键数值与算法挑战是什么？如何解决？
RQ4基于EP的分布式推断在速度和近似误差方面，与共识蒙特卡洛及其他分而治之方法相比表现如何？
RQ5在哪些场景下——如层次模型或隐私保护计算——该框架能最有效地应用？

主要发现

使用30个分布式节点的EP方法，相比单节点串行实现，计算时间最高减少96%。
该方法在计算时间和近似误差方面均优于Scott等人（2016）提出的共识蒙特卡洛算法。
30个节点的EP在82°处收敛至不同的混合分量，而10个节点的EP在194°处收敛至另一分量，表明对分区方式和收敛行为具有敏感性。
空 cavity 分布和倾斜分布的使用，有效促进了局部推断之间的信息共享，提升了后验近似的质量。
矩匹配和SNEP方法在局部推断中表现有效，且可混合使用以提升稳定性和收敛性。
该框架支持层次模型，并适用于数据或模型在多个源间分布的隐私保护与元分析场景。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。