QUICK REVIEW

[论文解读] Parallel Streaming Wasserstein Barycenters

Matthew Staib, Sebastian Claici|arXiv (Cornell University)|May 21, 2017

Markov Chains and Monte Carlo Methods参考文献 12被引用 26

一句话总结

该论文提出了一种通信高效、并行的流式算法，通过在半离散barycenter上使用随机梯度下降，计算任意概率分布的Wasserstein barycenter。该方法实现了可扩展的、实时的barycenter估计，即使在连续、非平稳输入测度下也表现良好，具有理论收敛保证，并且在精细网格（例如，$n \approx 10^6$）和大规模贝叶斯推断任务中，其经验性能优于先前的方法。

ABSTRACT

Efficiently aggregating data from different sources is a challenging problem, particularly when samples from each source are distributed differently. These differences can be inherent to the inference task or present for other reasons: sensors in a sensor network may be placed far apart, affecting their individual measurements. Conversely, it is computationally advantageous to split Bayesian inference tasks across subsets of data, but data need not be identically distributed across subsets. One principled way to fuse probability distributions is via the lens of optimal transport: the Wasserstein barycenter is a single distribution that summarizes a collection of input measures while respecting their geometry. However, computing the barycenter scales poorly and requires discretization of all input distributions and the barycenter itself. Improving on this situation, we present a scalable, communication-efficient, parallel algorithm for computing the Wasserstein barycenter of arbitrary distributions. Our algorithm can operate directly on continuous input distributions and is optimized for streaming data. Our method is even robust to nonstationary input distributions and produces a barycenter estimate that tracks the input measures over time. The algorithm is semi-discrete, needing to discretize only the barycenter estimate. To the best of our knowledge, we also provide the first bounds on the quality of the approximate barycenter as the discretization becomes finer. Finally, we demonstrate the practical effectiveness of our method, both in tracking moving distributions on a sphere, as well as in a large-scale Bayesian inference task.

研究动机与目标

为在可扩展且通信高效的方式下，从多个源高效聚合非同分布、可能为连续的概率测度提供解决方案。
实现实时、流式计算Wasserstein barycenter，使其能够随时间动态适应非平稳输入分布。
建立当barycenter支撑点数量增加时，barycenter近似质量的理论收敛界。
克服现有方法的可扩展性限制，如大规模线性规划或正则化最优传输，特别是在精细离散化情况下。
在高保真barycenter估计至关重要的大规模贝叶斯推断和传感器融合等实际应用中提供支持。

提出的方法

该算法使用随机梯度下降（SGD）迭代更新具有$n$个支撑点的离散barycenter，其中仅barycenter被离散化（半离散方法）。
每台工作节点并行处理输入测度的一个子集，并通过访问输入分布的样本计算随机梯度。
该方法每轮迭代仅需在工作节点之间传递单个整数，从而在分布式环境中实现高通信效率。
核心优化问题被表述为对barycenter对偶势函数的凹最大化，利用了最优传输的对偶公式。
该算法对非平稳分布具有鲁棒性，因为它会持续根据变化的输入测度更新barycenter估计。
通过分析barycenter支撑点数量$n$增加时的近似误差，建立了理论收敛性。

实验结果

研究问题

RQ1能否设计一种可扩展、通信高效且并行的算法，用于在流式环境中计算连续概率测度的Wasserstein barycenter？
RQ2barycenter的近似质量如何依赖于barycenter支撑点数量$n$？能否建立理论收敛界？
RQ3该算法能否在不每次重新求解大规模优化问题的情况下，持续跟踪非平稳输入分布下的动态barycenter估计？
RQ4在精细网格上，该方法与线性规划或正则化最优传输等现有方法相比，在准确性和可扩展性方面表现如何？
RQ5该方法在涉及子集后验分布的大规模贝叶斯推断任务中具有怎样的实际影响？

主要发现

在$n \approx 10^4$个barycenter支撑点下，经过317秒后，该算法的Wasserstein距离约为26，优于在相似网格上的线性规划方法。
对于$n \approx 10^6$，该方法在每个16线程节点上仅需不到2GB内存，而线性规划方法在$n=480$时已因内存限制而失败。
在广泛范围的步长下，该方法在$n \approx 10^4$时的近似效果优于线性规划方法，并且可通过提前终止实现更优结果。
理论界表明，随着$n$增加，近似误差减小，这是首次在一般情况下为半离散barycenter估计提供收敛保证。
该算法实现了对球面上移动分布的实时跟踪，并显著提升了贝叶斯推断中Wasserstein平均子后验（WASP）的准确性。
该方法可扩展至$n \approx 10^6$个支撑点，内存使用极少且并行效率高，证明了其在大规模应用中的实际可行性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。