QUICK REVIEW

[论文解读] Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

Qi Mao, Hsin-Ying Lee|arXiv (Cornell University)|Mar 13, 2019

Generative Adversarial Networks and Image Synthesis参考文献 32被引用 36

一句话总结

本文提出一种用于条件GAN的模式寻求正则化项，以鼓励探索次模态并在不改变网络结构或增加训练开销的情况下提高输出多样性，在分类生成、图像到图像翻译以及文本到图像合成中得到验证。

ABSTRACT

Most conditional generation tasks expect diverse outputs given a single conditional context. However, conditional generative adversarial networks (cGANs) often focus on the prior conditional information and ignore the input noise vectors, which contribute to the output variations. Recent attempts to resolve the mode collapse issue for cGANs are usually task-specific and computationally expensive. In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs. The proposed method explicitly maximizes the ratio of the distance between generated images with respect to the corresponding latent codes, thus encouraging the generators to explore more minor modes during training. This mode seeking regularization term is readily applicable to various conditional generation tasks without imposing training overhead or modifying the original network structures. We validate the proposed algorithm on three conditional image synthesis tasks including categorical generation, image-to-image translation, and text-to-image synthesis with different baseline models. Both qualitative and quantitative results demonstrate the effectiveness of the proposed regularization method for improving diversity without loss of quality.

研究动机与目标

解决在条件GAN中输入条件上下文支配、潜在噪声未被充分利用时的模式崩溃问题。
引入一个正则化项，促使生成器将相似的潜在向量映射到更具多样性的图像。
通过将方法应用于具有不同基线模型的多种 cGAN 任务，展示其通用性。
证明在不牺牲不同任务图像质量的前提下，多样性得到提升。

提出的方法

定义一个模式寻求损失，最大化图像距离与潜在编码距离的比值：L_ms = max_G ( d_I(G(c,z1), G(c,z2)) / d_z(z1,z2) ).
将正则化项加入到原始目标：L_new = L_ori + lambda_ms * L_ms.
在所有实验中对 d_I 和 d_z 使用 L1 距离，并将 lambda_ms = 1。
将正则化应用于现有架构，而不修改网络结构或训练计划。
在三个任务上进行评估（分类生成、图像到图像翻译、文本到图像合成），并使用各种基线进行对比。

实验结果

研究问题

RQ1正则化的模式寻求是否能在不降低视觉质量的前提下提升 cGAN 的多样性？
RQ2所提出的正则化是否能够在不同的条件生成任务中应用而无需特定模型修改？
RQ3相较于基线模型，该正则化对标准数据集的模式覆盖有何影响？
RQ4在图像到图像翻译和文本到图像合成中，成对数据与非成对数据设置下，多样性提升是否具有鲁棒性？

主要发现

MSGANs 在各任务中提升了多样性指标，同时保持或提升了图像质量。
在 DCGAN、Pix2Pix、DRIT、StackGAN++ 这类基线下，该方法在分类、图像到图像与文本到图像合成任务中均有效。
在各个实验中，该方法在生成分布中获得了更多的模态，同时保持相近的 FID，表明真实感未被削弱。
该技术开销极低，无需改变网络架构，展示了广泛的适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。