QUICK REVIEW

[論文レビュー] Learning Generative Models with Sinkhorn Divergences

Aude Genevay, Gabriel Peyré|arXiv (Cornell University)|Jun 1, 2017

Generative Adversarial Networks and Image Synthesis参考文献 8被引用数 73

ひとこと要約

本論文は Sinkhorn 損失を導入する。エントロピー正則化された最適輸送ベースの目的関数で、生成モデルの訓練のために Sinkhorn 反復と自動微分を用いて OT と MMD の損失の間を補間し、安定で拡張可能な学習を実現する。

ABSTRACT

The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to information divergences to handle such problematic scenarios. Unfortunately, training generative machines using OT raises formidable computational and statistical challenges, because of (i) the computational burden of evaluating OT losses, (ii) the instability and lack of smoothness of these losses, (iii) the difficulty to estimate robustly these losses and their gradients in high dimension. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable high-dimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function.

研究の動機と目的

ターゲット分布が特異である場合や低次元の流形上に存在する場合に、生成モデルを適合させるために最適輸送幾何学の利用を動機づける。
高次元の生成モデリングに対して微分可能で頑健な、扱いやすい OT ベースの損失（Sinkhorn loss）を導入する。
ミニバッチ推定量と微分可能な Sinkhorn 反復を組み合わせて、スケーラブルな訓練を実現する実用的な SGD 対応アルゴリズムを提供する。

提案手法

Sinkhorn loss をエントロピー正規化を伴う正則化 OT 距離として定義し、その極限挙動を OT (epsilon -> 0) および MMD (epsilon -> infinity) に対して示す。
密度フィッティング問題を、モデルのプッシュフォワードとデータ分布との間の Sinkhorn loss を最小化する問題として定式化する。
Sinkhorn 反復を介した微分可能で GPU に友好的な最適化を可能にするために、Gibbs カーネルによるエントロピック平滑化を用いる。
ミニバッチと L Sinkhorn 反復で損失を近似し、自動微分の微分可能な代理物を得る。
生成サンプルと実サンプル間の距離測定を改善するために、特徴写像 f_phi を介してパラメトリックコスト c_phi を学習する（theta, phi に対するミニマックス）。
Sinkhorn ステップを標準の SGD に統合する AutoDiff 対応アルゴリズムを提供し、計算量 O(L m n) を持つ。

実験結果

リサーチクエスチョン

RQ1エントロピック正則化により、高次元データの訓練におけるトレース可能で微分可能な OT ベースの損失を得られるだろうか？
RQ2Sinkhorn loss はどのように OT と MMD の間を補間し、サンプル複雑性と勾配の安定性に実用的な影響を与えるのか？
RQ3データ駆動の地上コストを学習して、生成分布と実データ分布の整合を改善できるか？
RQ4標準的なハードウェアで、ミニバッチと自動微分を用いた Sinkhorn ベースの訓練を実装することは実現可能か？
RQ5ハイパーパラメータ epsilon、バッチサイズ、および Sinkhorn 反復回数が収束と生成品質にどう影響するか？

主な発見

Sinkhorn loss は OT (epsilon -> 0) と MMD (epsilon -> infinity) の間を滑らかに補間し、幾何学とサンプル効率のトレードオフを提供する。
エントロピック平滑化は勾配のバイアスを低減し、高次元性能を改善し、Sinkhorn 反復による安定した訓練を可能にする。
ミニバッチと L Sinkhorn 反復を備えた実用的な AutoDiff ベースのアルゴリズムは、微分可能な生成器のための微分可能で GPU に優しい訓練を実現する。
特徴写像 f_phi を介してパラメトリックコストを学習することは、距離測定をさらに改善し、min_theta max_phi の定式につながる。
楕円でのデータフィットと画像生成 (MNIST, CIFAR-10) に関する実証は、epsilon、batch size、および L に対する感度を示し、より大きな epsilon はしばしば収束を速くする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。