QUICK REVIEW

[Paper Review] Diversity-Sensitive Conditional Generative Adversarial Networks

Dingdong Yang, Seunghoon Hong|arXiv (Cornell University)|Jan 25, 2019

Generative Adversarial Networks and Image Synthesis126 citations

TL;DR

The paper introduces a simple regularization for the generator in conditional GANs to promote diversity in outputs conditioned on latent codes, addressing mode collapse across image-to-image translation, inpainting, and video prediction.

ABSTRACT

We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.

Motivation & Objective

Motivate and address mode collapse in conditional GANs where inputs map to deterministic outputs.
Propose a simple regularization that encourages diverse outputs dependent on latent codes.
Show that the regularization improves multi-modal generation across multiple conditional tasks.
Demonstrate controllable trade-offs between visual quality and diversity via a hyperparameter.

Proposed method

Define a conditional GAN objective for G and D.
Add a generator regularization term Lz that maximizes the normalized latent-output distance across two latent codes, preventing collapse to a single mode.
Form the full objective: min_G max_D LcGAN(G,D) - lambda Lz(G).
Optionally extend Lz using feature-space distances from the discriminator or other metrics.
Apply the regularization to various baselines and tasks to demonstrate generality.
Show that lambda controls diversity versus realism.

Experimental results

Research questions

RQ1Can a simple generator-side regularization induce true multi-modality in cGAN outputs without architectural changes?
RQ2How does the diversity-promoting term Lz interact with existing reconstruction losses to balance realism and diversity?
RQ3Does the approach generalize across tasks (image-to-image translation, inpainting, video prediction) and architectures?
RQ4What is the effect of latent code dimensionality on diversity and output quality?

Key findings

Regularization induces stochastic, diverse outputs where baselines are deterministic.
Increasing lambda raises LPIPS diversity and lowers FID up to a point, revealing a quality-diversity trade-off.
DSGAN outperforms task-specific multi-modal approaches in several metrics, while preserving realism.
The method is compatible with high-resolution synthesis and other loss terms (e.g., pixel/feature-based reconstructions).
Using perceptual/feature-based distances in Lz yields semantically meaningful variations in inpainting results.
The approach yields more diverse and realistic video predictions than the baseline cGAN and competitive with SAVP, with fewer parameters.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.