[论文解读] Stochastic Optimization for Large-scale Optimal Transport
论文介绍随机优化方案,用于在离散、半离散和连续 setting 计算大规模最优传输距离,使用对偶形式和熵正则化,以在没有离散化误差的情况下实现可证明收敛。
Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation ; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.
研究动机与目标
- 激励在机器学习中对大规模分布的最优传输距离进行高效计算。
- 开发通过从分布采样来操作的随机优化方法,避免离散化。
- 提供在离散、半离散和连续 OT 设置下具有可证明收敛性的算法。
- 展示经验比较,随机方法在某些场景下优于传统的 Sinkhorn 风格求解器。
提出的方法
- 将对偶 OT 问题重新表述为对期望的最大化,以实现随机优化(以及半对偶形式)。
- 使用熵正则化得到平滑的对偶,从而加速收敛(在适当时使用基于 Sinkhorn 的技术)。
- 提出 SAG(随机平均梯度)用于离散 OT 设置,在大规模问题中优于 Sinkhorn。
- 对半离散 OT 应用均值 SGD,以处理离散测度与连续测度而不离散化连续密度。
- 对于连续-连续 OT,在 RKHS 中展开对偶变量并应用核 SGD,从而收敛到 RKHS 中的对偶解。
- 提供具有收敛性保证的算法,并讨论实际问题,如小批量、步长和 RKHS 投影。
实验结果
研究问题
- RQ1随机优化方法是否能高效地计算大规模离散分布的 OT 距离并克服 Sinkhorn 的瓶颈?
- RQ2如何利用对偶形式和熵正则化来处理半离散 OT 而不产生离散化误差?
- RQ3是否可行在 RKHS 框架下使用随机优化来求解两个连续密度之间的 OT 距离?
- RQ4SAG、SGD 和核 SGD 在 OT 设置中的收敛性与实际性能(在速度与准确性方面)是什么?
- RQ5这些随机方法与最先进的离散 OT 求解器相比,在离散、半离散和连续基准测试中的经验比较如何?
主要发现
- 增量随机优化(SAG)在大规模离散 OT 问题上可以优于 Sinkhorn。
- 半离散 OT 的均值 SGD 提供适用于一个分布连续、另一个离散的问题的收敛速率。
- RKHS 中的核 SGD 提供两个连续密度之间的 OT 的收敛方法,这是在实际方法中,除了有限样本离散化之外的首例。
- 熵正则化使对偶平滑,便于具收敛性的随机优化。
- 在词嵌入和词运距离上的经验测试显示,在大规模离散设置中比 Sinkhorn 收敛更快,并且方法在 GPU 硬件上具有良好扩展性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。