QUICK REVIEW

[论文解读] Diversity-Sensitive Conditional Generative Adversarial Networks

Dingdong Yang, Seunghoon Hong|arXiv (Cornell University)|Jan 25, 2019

Generative Adversarial Networks and Image Synthesis被引用 126

一句话总结

本论文在条件GAN中为生成器引入一个简单的正则化，以促进输出在潜在编码条件下的多样性，解决在图像到图像翻译、修复和视频预测中的模态崩溃。

ABSTRACT

We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.

研究动机与目标

在条件GAN中激发并解决输入映射到确定性输出的模态崩溃。
提出一个简单的正则化，鼓励依赖于潜在编码的多样化输出。
证明正则化在多个条件任务中提升了多模态生成。
通过一个超参数演示视觉质量与多样性之间可控的权衡。

提出的方法

为G和D定义一个条件GAN目标函数。
添加生成器正则化项Lz，使在两个潜在编码之间归一化的潜在输出距离最大化，防止塌缩到单一模态。
形成完整目标：min_G max_D LcGAN(G,D) - lambda Lz(G)。
可选地利用对判别器的特征空间距离或其他度量来扩展Lz。
将正则化应用到各种基线和任务以展示其通用性。
展示lambda如何控制多样性与真实感之间的权衡。

实验结果

研究问题

RQ1简单的生成器端正则化是否能在不改变架构的情况下，在cGAN输出中诱导真正的多模态？
RQ2促进多样性的项Lz如何与现有重构损失交互以平衡真实感和多样性？
RQ3该方法是否在跨任务（图像到图像翻译、修复、视频预测）和架构中具有泛化性？
RQ4潜在编码维度对多样性和输出质量有何影响？

主要发现

正则化在基线是确定性的地方产生随机且多样的输出。
增加lambda会提高LPIPS多样性并在某点降低FID，揭示质量-多样性权衡。
DSGAN在若干指标上优于任务特定的多模态方法，同时保持真实感。
该方法与高分辨率合成及其他损失项（如像素/特征基重建）兼容。
在Lz中使用感知/特征距离会产生修复结果的语义有意义变化。
与基线cGAN相比，该方法产生更丰富且更真实的视频预测，并且与SAVP相比具有竞争力，且参数更少。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。