QUICK REVIEW

[論文レビュー] Generative Modeling by Estimating Gradients of the Data Distribution

Yang Song, Stefano Ermon|arXiv (Cornell University)|Jul 12, 2019

Generative Adversarial Networks and Image Synthesis参考文献 68被引用数 985

ひとこと要約

この論文は、スコアマッチングを用いて摂動データのスコア関数を学習し、annealed Langevin dynamics によってサンプルを生成する。敵対的学習なしで競争力のある画像生成結果を達成する。

ABSTRACT

We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined and hard to estimate when the data resides on low-dimensional manifolds, we perturb the data with different levels of Gaussian noise, and jointly estimate the corresponding scores, i.e., the vector fields of gradients of the perturbed data distribution for all noise levels. For sampling, we propose an annealed Langevin dynamics where we use gradients corresponding to gradually decreasing noise levels as the sampling process gets closer to the data manifold. Our framework allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons. Our models produce samples comparable to GANs on MNIST, CelebA and CIFAR-10 datasets, achieving a new state-of-the-art inception score of 8.87 on CIFAR-10. Additionally, we demonstrate that our models learn effective representations via image inpainting experiments.

研究の動機と目的

敵対的学習や尤度制約を回避する新しい生成モデリング手法を動機づける。
Gaussianノイズを複数レベルでデータに摂動させることにより多様体と低密度サンプリングの課題に対処する。
Noise Conditional Score Networkを学習し、すべてのノイズレベルのスコアを共同推定する。
annealed Langevin dynamicsを用いて、ノイズレベルを徐々に低減させる分布からサンプリングしデータ多様体へ近づける。

提案手法

正規化された尤度を必要とせず、スコアマッチングを介して摂動データ分布のスコアを推定する。
一つの条件付きスコアネットワーク s_theta(x, sigma) を訓練して、Gaussianノイズレベルの集合 {sigma_i} に対する ∇x log q_sigma(x) を近似する。
複数のノイズレベルにわたるデノイズングスコアマッチングを、λ(sigma_i) = sigma_i^2 を用いて寄与を調整する重み付き目的関数で組み合わせる。
annealed Langevin dynamics を用いてノイズレベルを大きい sigma から小さい sigma へ段階的に低減させ、混合とサンプル品質を向上させる。
畳み込み拡張を用いたU-Netと条件付きインスタンス正規化を組み込んだネットワークアーキテクチャで画像データを扱う。
敵対的学習を回避し、異なるモデルを定量的に比較できる訓練目的を提供する。

実験結果

リサーチクエスチョン

RQ1敵対的学習や尤度ベースの目的なしに、スコアベースの生成モデリングはデータ分布を学習できるのか？
RQ2Gaussianノイズを複数レベルでデータに摂動させることにより、一貫したスコア推定と効率的なサンプリングが可能になるのか？
RQ3annealed Langevin sampling手法は、複数ノイズスコア推定から高品質なサンプルを効果的に生成できるのか？
RQ4Noise Conditional Score Networks (NCSN) は標準データセットで競争力のある画像サンプルと有用な表現（例：インペインティング）を生成するのか？

主な発見

Model	Inception Score	FID
Unconditional PixelCNN	4.60	65.93
PixelIQN	5.29	49.46
EBM (Uncond)	6.02	40.58
WGAN-GP	7.86±.07	36.40
MoLM	7.90±.10	18.90
SNGAN	8.22±.05	21.70
ProgressiveGAN	8.80±.05	-
NCSN (Ours)	8.87±.12	25.32
EBM (CIFAR-10)	8.30	37.90
SNGAN (CIFAR-10)	8.60±.08	25.50
BigGAN	9.22	14.73

CIFAR-10のUnconditional-inception scoreを8.87で達成（当時のUnconditionalモデルの最先端）。
CIFAR-10のFIDを25.32で達成し、トップモデル（例：SNGAN）と競合。
MNIST、CelebA、CIFAR-10で高品質なサンプルを示し、尤度ベースモデルやGANと比較可能な水準。
学習された意味のある表現を示すインペインティングの成功。
多重ノイズスコアネットワークを用いたAnnealed Langevin dynamics が、標準 Langevin サンプリングよりモード混合を改善。
敵対的学習なしでモデル比較の体系的な目的を提供。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。