QUICK REVIEW

[論文レビュー] Stochastic Optimization for Large-scale Optimal Transport

Aude Genevay, Marco Cuturi|arXiv (Cornell University)|May 27, 2016

Markov Chains and Monte Carlo Methods参考文献 20被引用数 153

ひとこと要約

本論文は、離散・半離散・連続設定に跨る大規模な最適輸送距離を計算するための確率的最適化スキームを導入し、対(デュアル)定式化とエントロピー正則化を用いて、離散化誤差なしに証明可能な収束を達成する。

ABSTRACT

Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation ; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.

研究の動機と目的

機械学習における大規模分布の最適輸送距離の効率的な計算の動機づけ。
分布からのサンプリングで動作し、離散化を避ける確率的最適化手法を開発する。
離散・半離散・連続OT設定に対して、証明可能な収束性を有するアルゴリズムを提供する。
確率的手法が従来の Sinkhorn 型ソルバより優れていることを示す実証的比較を示す。

提案手法

デュアルOT問題を期待値の最大化として再表現し、確率的最適化（および半デュアル形）を可能にする。
エントロピー正則化を用いて滑らかなデュアルを得て、より速い収束を可能にする（適切な場合はSinkhornベースの手法）。
大規模問題で Sinkhorn を上回るため、離散OT設定に対してSAG（確率的平均勾配）を提案する。
離散測度と連続測度を離散化せずに扱う半離散OTには、平均化SGDを適用する。
連続対連続のOTについて、デュアル変数をRKHS内で展開しカーネルSGDを適用する。これによりRKHS内でデュアル解へ収束する。
収束保証を伴うアルゴリズムを提供し、ミニバッチング、学習率、RKHS射影などの実用性について論じる。

実験結果

リサーチクエスチョン

RQ1確率的最適化法は、大規模な離散分布のOT距離を効率的に計算でき、Sinkhornのボトルネックを克服できるか。
RQ2デュアル定式化とエントロピー正則化をどう活用して、半離散OTを離散化誤差なしに扱えるか。
RQ3RKHSフレームワークで確率的最適化を用いて二つの連続密度間のOT距離を解くことは実現可能か。
RQ4OT設定におけるSAG、SGD、カーネルSGDの収束性と実務的な性能（速度と精度）はどうか。
RQ5離散・半離散・連続のベンチマーク全般で、これらの確率的手法が最先端の離散OTソルバと経験的にどう比較されるか。

主な発見

漸増型確率的最適化（SAG）は、大規模な離散OT問題でSinkhornを上回ることができる。
半離散OTに対する平均化SGDは、一方が連続で他方が離散の問題に適した収束速度をもたらす。
RKHSでのカーネルSGDは、二つの連続密度間のOTに収束するアプローチを提供し、有限サンプルの離散化を除けば実用的手法の中で初めてのもの。
エントロピー正則化は、確率的最適化を可能にする滑らかなデュアルを実現し、証明可能な収束を促進する。
語嵌入とWord Mover’s距離の実証実験は、大規模な離散設定でSinkhornより収束が速いことを示し、GPUでのスケール性も良好である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。