QUICK REVIEW

[論文レビュー] Dual Discriminator Generative Adversarial Nets

Tu Dinh Nguyen, Trung Le|arXiv (Cornell University)|Sep 12, 2017

Generative Adversarial Networks and Image Synthesis参考文献 21被引用数 122

ひとこと要約

D2GANはGANフレームワークに二つの識別器を導入し、KLと逆KLの両方の発散を共同最小化することでモード崩壊を緩和し、ImageNetのような大規模データセットでのスケーラブルな生成を実現する。

ABSTRACT

We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.

研究の動機と目的

GANのモード崩壊を補完的なKL発散の特性を活用して動機付けと対処する。
データ生成を多様化するための三者プレイのGANフレームワーク（2つの識別器＋1つの生成器）を提案する。
KLと逆KLの両方の収束を示す理論分析を提供する。
ImageNetを含む大規模データセットへのスケーラビリティを、競争力のある品質と多様性とともに示す。

提案手法

2つの識別器D1とD2を、それぞれ異なる目的を持たせ、生成器Gを三者ミニマックスゲームに組み込む。
識別器出力は正の実数とし、最適化目的にはKLと逆KLの効果のバランスを取るハイパーパラメータαとβを含む。
Gを与えたときの最適識別器を導出し、D1*とD2*がp_dataとp_Gに依存することを示す。
ナッシュ均衡において、識別器が最適なとき生成器はKLと逆KLの両方を最小化し、p_G = p_dataを達成する。
α、βとトレーニング手順による安定化を提供し、GANの交互更新に類似した方法。
実験的には、合成データと大規模データセット（MNIST、CIFAR-10、STL-10、ImageNet）で標準アーキテクチャを用いて評価。

実験結果

リサーチクエスチョン

RQ1デュアル識別器GANは大規模データセットでモード崩壊を防ぎつつスケール可能か？
RQ2KLと逆KL発散を同時に最適化することは、生成サンプルの多様性と品質にどう影響するか？
RQ3D2GANでデータ分布を再現できる理論的条件は？
RQ4D2GANは多様なベンチマークと指標で最先端のGAN系と比較してどうか？
RQ5D2GANはImageNetにスケールでき、 diverse high-quality images を生成できるか？

主な発見

Model	# modes covered	D_KL(model\|\|data)
GAN [18]	628.0 ± 140.9	2.58 ± 0.75
UnrolledGAN [18]	817.4 ± 37.9	1.43 ± 0.12
DCGAN [5]	849.6 ± 62.7	0.73 ± 0.09
Reg-GAN [5]	955.5 ± 18.7	0.64 ± 0.05
D2GAN	1000.0 ± 0.00	0.08 ± 0.01

固定されたGの下で最適な識別器はD1* = alpha p_data / p_G および D2* = beta p_G / p_data。
最適な識別器を持つナッシュ均衡において、J(G*,D1*,D2*)は alpha(log alpha−1) + beta(log beta−1) であり、p_G = p_data。
生成器の目的は alpha KL と beta 逆KL発散を含み、モードカバレッジとモード品質のバランスを可能にする。
MNIST、CIFAR-10、STL-10、ImageNetでの経験的結果は多様性の改善と競争力のある品質を示し、ImageNetまでのスケーラビリティを実証。
合成2D多峰データセットでは、GANとUnrolledGANよりもモードカバレッジが良く、対称KLとワサースタイン距離が小さい。
表1はモードカバレッジとKL発散を報告し、D2GANは1000モードをカバー、D_KL(model||data) = 0.08 ± 0.01、ベースラインを上回る。
表2はCIFAR-10のInceptionスコアでD2GANが7.15 ± 0.07を達成し、教師なしベースラインの中で競争力のある順位。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。