QUICK REVIEW

[論文レビュー] Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

Qi Mao, Hsin-Ying Lee|arXiv (Cornell University)|Mar 13, 2019

Generative Adversarial Networks and Image Synthesis参考文献 32被引用数 36

ひとこと要約

この論文は、条件付きGANのモード探索正則化項を導入し、Minor modesの探索を促進し出力の多様性を高める一方、ネットワークアーキテクチャの変更やトレーニング overheadを追加せず、カテゴリ生成、画像対画像翻訳、テキストから画像生成にわたって検証した。

ABSTRACT

Most conditional generation tasks expect diverse outputs given a single conditional context. However, conditional generative adversarial networks (cGANs) often focus on the prior conditional information and ignore the input noise vectors, which contribute to the output variations. Recent attempts to resolve the mode collapse issue for cGANs are usually task-specific and computationally expensive. In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs. The proposed method explicitly maximizes the ratio of the distance between generated images with respect to the corresponding latent codes, thus encouraging the generators to explore more minor modes during training. This mode seeking regularization term is readily applicable to various conditional generation tasks without imposing training overhead or modifying the original network structures. We validate the proposed algorithm on three conditional image synthesis tasks including categorical generation, image-to-image translation, and text-to-image synthesis with different baseline models. Both qualitative and quantitative results demonstrate the effectiveness of the proposed regularization method for improving diversity without loss of quality.

研究の動機と目的

入力の条件文脈が支配的で潜在ノイズが過少利用される条件付きGANにおけるモード崩壊へ対処する。
ジェネレータが類似した潜在ベクトルをより多様な画像へマップするよう促す正則化項を導入する。
異なるベースラインモデルを用いた複数のcGANタスクへ適用して方法の一般性を示す。
タスク間で画像品質を損なうことなく多様性が向上することを示す。

提案手法

モード探索損失を定義し、画像距離と潜在コード距離の比を最大化する: L_ms = max_G ( d_I(G(c,z1), G(c,z2)) / d_z(z1,z2) ).
正則化項を元の目的関数に追加する: L_new = L_ori + lambda_ms * L_ms.
d_I および d_z に対して L1 距離を用い、lambda_ms = 1 を全実験で設定。
ネットワーク構造やトレーニングスケジュールを変更せず、既存のアーキテクチャに正則化を適用。
三つのタスク（カテゴリ生成、画像対画像翻訳、テキストから画像合成）で、さまざまなベースラインと評価。

実験結果

リサーチクエスチョン

RQ1モード探索正則化は視覚品質を低下させることなく cGAN の多様性を改善できるか？
RQ2提案された正則化はモデル固有の変更なしに、異なる条件生成タスクに適用できるか？
RQ3正則化は標準データセットでベースラインモデルと比較してモードカバーをどう変えるか？
RQ4画像対画像翻訳とテキストから画像合成において、対ペアデータと非対ペアデータ設定の多様性向上は堅牢か？

主な発見

MSGANs はタスクを跨いで多様性指標を改善し、画像品質を維持または向上させる。
手法は DCGAN、Pix2Pix、DRIT、StackGAN++ のベースラインと統合した場合に、カテゴリ生成、画像対画像翻訳、テキストから画像生成で有効である。
実験全体を通じて、生成分布のモード数が増加し、FID は同等または改善され、リアリズムが保たれている。
この手法はオーバーヘッドが最小で、ネットワークアーキテクチャの変更を必要としないため、広い適用性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。