QUICK REVIEW

[Paper Review] Wasserstein GAN

Martín Arjovsky, Soumith Chintala|arXiv (Cornell University)|Jan 26, 2017

Fibroblast Growth Factor Research606 citations

TL;DR

The paper introduces Wasserstein GAN (WGAN) which uses the Earth Mover distance as a loss to train generative models, theoretically justifying and empirically demonstrating improved training stability and reduced mode collapse.

ABSTRACT

We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to other distances between distributions.

Motivation & Objective

Motivate learning distributions when densities may not exist and conventional KL-based methods fail on low-dimensional manifolds.
Propose a practical objective for GANs based on the Earth Mover distance (Wasserstein distance) that yields continuous gradients.
Demonstrate theoretical properties of WGAN and show empirical benefits over standard GANs in terms of stability and mode coverage.

Proposed method

Define and compare distance measures between distributions (TV, KL, JS, EM) and argue EM is more suitable for distributions on low-dimensional manifolds.
Use Kantorovich-Rubinstein duality to express the EM distance as a supremum over 1-Lipschitz functions.
Approximate the EM distance with a parameterized 1-Lipschitz function (the critic) via weight clipping to enforce Lipschitz continuity.
Train the critic to near optimality and update the generator using gradients through the critic to minimize the EM distance.
Provide an algorithm (WGAN) that alternates between multiple critic updates and generator updates with practical hyperparameters.
Discuss limitations of weight clipping and suggest areas for improved Lipschitz enforcement.

Experimental results

Research questions

RQ1Does optimizing the Earth M mover distance provide continuous, informative gradients for training generative models?
RQ2How does WGAN compare to standard GANs in terms of training stability, mode coverage, and correlation between loss and sample quality?
RQ3What practical considerations (e.g., Lipschitz constraint enforcement) affect the performance and stability of WGANs?

Key findings

WGAN provides a meaningful loss metric that correlates with generator convergence and sample quality.
WGAN training is more stable and less prone to mode collapse than traditional GANs.
Training the critic to optimality yields reliable gradients for the generator, unlike GAN discriminators which saturate and yield vanishing gradients.
Empirically, WGANs show improved robustness across different generator architectures compared to standard GANs.
The EM distance is continuous and differentiable almost everywhere, supporting gradient-based optimization in neural networks.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.