QUICK REVIEW

[論文レビュー] The Cramer Distance as a Solution to Biased Wasserstein Gradients

Marc G. Bellemare, Ivo Danihelka|arXiv (Cornell University)|May 30, 2017

Geometric Analysis and Curvature Flows参考文献 30被引用数 252

ひとこと要約

その論文は SGD with Wasserstein loss が biased gradients を生み、正しくない極値へ収束する可能性があることを示し、 unbiased で geometry-aware な代替として Cramér distance を導入し、加えて Cramér GAN を提案します。

ABSTRACT

The Wasserstein probability metric has received much attention from the machine learning community. Unlike the Kullback-Leibler divergence, which strictly measures change in probability, the Wasserstein metric reflects the underlying geometry between outcomes. The value of being sensitive to this geometry has been demonstrated, among others, in ordinal regression and generative modelling. In this paper we describe three natural properties of probability divergences that reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein metric possesses the first two properties but, unlike the Kullback-Leibler divergence, does not possess the third. We provide empirical evidence suggesting that this is a serious issue in practice. Leveraging insights from probabilistic forecasting we propose an alternative to the Wasserstein metric, the Cramér distance. We show that the Cramér distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences. To illustrate the relevance of the Cramér distance in practice we design a new algorithm, the Cramér Generative Adversarial Network (GAN), and show that it performs significantly better than the related Wasserstein GAN.

研究の動機と目的

ジオメトリを尊重しつつ信頼性の高い最適化を可能にする発散の必要性を動機づける。
サンプルから推定したときに Wasserstein 勾配がなぜ偏っているのかを診断する。
サンプル勾配が無偏である理想的な発散として Cramér distance を導入する。
順序回帰と GAN 実験を通じて Cramér distance の実用的な利点を示す。

提案手法

KL、Wasserstein、Cramér 距離を、スケール感度、和不変性、そして無偏サンプル勾配の観点で定義・比較する。
KL は無偏勾配を持つがスケール感度がないことを証明し、Wasserstein は理想的だが無偏勾配を欠く（U）ことを示す。
Bernoulli 系での理論結果（定理1）を用いて Wasserstein 勾配が偏っていることを示す。
Cramér distance を導入し、スケール不変性（S）、和不変性（I）、および無偏勾配（U）を満たすことを証明する（定理2）。
学習された h で変数を変換し、エネルギー距離風の損失を用いて、勾配ペナルティ付き critic を導入することで Cramér GAN を提案する。
順序回帰と画像生成の実験を通じて、Cramér GAN を Wasserstein GANs と比較する。

実験結果

リサーチクエスチョン

RQ1Wasserstein 勾配は SGD で最適化した場合、無偏サンプル勾配を持つのか？
RQ2Cramér distance は幾何的感度を保ちつつ無偏勾配を提供するのか？
RQ3順序回帰や GANs のような実践的な学習タスクにおいて、Cramér distance は Wasserstein や KL ベースの方法と比べてどのように機能するか？

主な発見

サンプル Wasserstein 損失は偏った勾配推定を生み、間違った極値へ収束し得る（定理1）。
Cramér distance は無偏サンプル勾配を持ち、幾何情報を保持する（定理2）。
順序回帰では、Cramér distance を最小化すると、Wasserstein や KL のベースラインより RMSE が改善され、Wasserstein loss がより低くなる。
Cramér GAN は WGAN-GP よりも多様な画像補完を生み、訓練がより安定し、独立した critic 距離がより良い。
提案されたフレームワークは ML アプリケーションにおいて Wasserstein より Cramér distance を用いる実用的な利点を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。