QUICK REVIEW

[Paper Review] GAN and VAE from an Optimal Transport Point of View

Aude Genevay, Gabriel Peyré|arXiv (Cornell University)|Jun 6, 2017

Nuclear reactor physics and engineering5 references38 citations

TL;DR

This paper unifies Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) through the lens of optimal transport, framing both as solutions to a Minimum Kantorovitch Estimation (MKE) problem. It shows that WGAN and WVAE emerge as dual formulations of the same underlying optimal transport objective, with WGAN emphasizing adversarial training via dual potentials and WVAE emphasizing autoencoding with a relaxed marginal constraint, explaining their differing training stability and generation quality.

ABSTRACT

This short article revisits some of the ideas introduced in arXiv:1701.07875 and arXiv:1705.07642 in a simple setup. This sheds some lights on the connexions between Variational Autoencoders (VAE), Generative Adversarial Networks (GAN) and Minimum Kantorovitch Estimators (MKE).

Motivation & Objective

To unify the theoretical understanding of GANs and VAEs by interpreting both as instances of Minimum Kantorovitch Estimation (MKE) under optimal transport.
To clarify the duality between adversarial training (WGAN) and autoencoding (WVAE) in terms of primal and dual formulations of the same optimal transport problem.
To explain the observed differences in training stability and generation quality—e.g., GANs producing sharper images—through the lens of gradient computation in primal vs. dual formulations.
To analyze the role of relaxation in the marginal constraint for VAEs (via parameterized encoder maps) and its impact on convergence and bias in the WVAE formulation.
To investigate the theoretical convergence of WGAN and WVAE to the true MKE solution in the non-parametric limit, where model capacity and regularization are balanced.

Proposed method

Formulates GANs and VAEs as solutions to a Minimum Kantorovitch Estimator (MKE) problem, minimizing the Wasserstein distance between a generated distribution and empirical data.
Derives the dual formulation of the MKE problem using Kantorovich potentials, enabling the use of deep neural networks as discriminators in the WGAN framework.
Introduces the Wasserstein-GAN (WGAN) by parameterizing the dual potential $ h_{ heta} $ with a deep neural network, leading to a minimax optimization problem over generator and discriminator parameters.
Proposes the Wasserstein-VAE (WVAE) by relaxing the marginal constraint on the coupling measure, using a parametric encoder $ f_{ heta} $ to define a transport map from data to latent space.
Uses the $ c $-transform to simplify the dual problem and enables the use of stochastic gradient descent for optimizing the dual potential in the WGAN setting.
Introduces a relaxed, unbalanced optimal transport formulation in WVAE via a divergence term $ D(f_{ hetalat} u igracevert ho) $, allowing for flexible and differentiable training with latent space regularization.

Experimental results

Research questions

RQ1How can GANs and VAEs be unified under a common theoretical framework based on optimal transport?
RQ2What is the relationship between the primal (MKE) and dual (WGAN) formulations of optimal transport in the context of deep generative modeling?
RQ3Why is VAE training more stable than GAN training, and how does this relate to gradient computation in primal vs. dual formulations?
RQ4What is the impact of relaxing the marginal constraint in WVAE, and how does it affect the bias and convergence of the estimator?
RQ5Do WGAN and WVAE converge to the same solution in the non-parametric limit, and what are the practical implications of this convergence?

Key findings

The WGAN and WVAE formulations are dual to each other: WGAN optimizes the dual problem using adversarial potentials, while WVAE optimizes the primal problem with a relaxed marginal constraint.
The primal gradient formula (5) is more stable than the dual gradient formula (3), which requires accurate optimization of the dual potential $ h^lat $, explaining the empirical stability of VAEs over GANs.
In the limit $ heta o heta_{ ext{MKE}} $, the WGAN solution satisfies $ E( heta_{ ext{WGAN}}) riangleq W_c(g_{ heta}lat ho, u) riangleq ext{min} $, while WVAE introduces a bias due to relaxation, leading to $ E( heta_{ ext{WVAE}}) riangleq ext{min} $ with $ heta_{ ext{WVAE}} $ being a biased estimator.
In the non-parametric limit (infinite capacity of $ h_ heta $ and $ f_ heta $, and $ heta o heta_{ ext{MKE}} $), both WGAN and WVAE converge to the same solution as the true MKE, suggesting theoretical equivalence in the limit.
Despite theoretical convergence, the convergence rate in practice may be slow, and the non-parametric limit may not yield good estimators for complex datasets, implying that implicit regularization via non-convex optimization is beneficial.
The WGAN objective is strictly better than the MKE objective in terms of minimizing the Wasserstein distance, while the WVAE objective is strictly worse due to the relaxation term, resulting in $ E( heta_{ ext{WGAN}}) riangleq ext{min} riangleq E( heta_{ ext{MKE}}) riangleq E( heta_{ ext{WVAE}}) $.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.