QUICK REVIEW

[论文解读] Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step

William Fedus, Mihaela Rosca|arXiv (Cornell University)|Oct 23, 2017

Global Energy and Sustainability Research参考文献 19被引用 100

一句话总结

本论文认为GAN训练不需要在每一步都单调地最小化某个散度；通过在底层散度并非每一步都被减小的轨迹中学习，可以向纳什均衡收敛，来自基于散度的视角的梯度惩罚仍然有助于非散度基训练。

ABSTRACT

Generative adversarial networks (GANs) are a family of generative models that do not minimize a single training criterion. Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost. GANs are designed to reach a Nash equilibrium at which each player cannot reduce their cost without changing the other players' parameters. One useful approach for the theory of GANs is to show that a divergence between the training distribution and the model distribution obtains its minimum value at equilibrium. Several recent research directions have been motivated by the idea that this divergence is the primary guide for the learning process and that every step of learning should decrease the divergence. We show that this view is overly restrictive. During GAN training, the discriminator provides learning signal in situations where the gradients of the divergences between distributions would not be useful. We provide empirical counterexamples to the view of GAN training as divergence minimization. Specifically, we demonstrate that GANs are able to learn distributions in situations where the divergence minimization point of view predicts they would fail. We also show that gradient penalties motivated from the divergence minimization perspective are equally helpful when applied in other contexts in which the divergence minimization perspective does not predict they would be helpful. This contributes to a growing body of evidence that GAN training may be more usefully viewed as approaching Nash equilibria via trajectories that do not necessarily minimize a specific divergence at each step.

研究动机与目标

澄清标准/极大极小/非饱和 GAN 及相关散度的术语。
通过实证检验GAN方法的改进是否源于散度最小化还是其他学习动力学。
在 JS 散度不提供有用梯度的合成任务上评估非饱和 GAN。
在合成数据和真实数据集上评估梯度惩罚对非饱和 GAN 的影响。

提出的方法

定义并比较非饱和 GAN (NS-GAN) 与极大极小 GAN (M-GAN)，并将其与 JS 散度联系起来。
引入梯度惩罚（GAN-GP 与 DRAGAN-NS）并将其应用于 NS-GAN。
在数据落在低维流形上的合成实验中测试学习动力学。
在 Color MNIST、CelebA 和 CIFAR-10 上使用多种指标评估 NS-GAN 及带梯度惩罚的变体。
在改变判别器更新次数的情况下分析超参数鲁棒性与训练动力学。
提供与 WGAN-GP 的定性与定量比较，以理解梯度与散度的作用。

实验结果

研究问题

RQ1在 JS 散度最小化预测失效的任务上，NS-GAN 是否能成功？
RQ2梯度惩罚是否通过改进优化动力学，而非改变散度最小化，从而改善 NS-GAN 的训练？
RQ3在不同超参数下，NS-GAN 和带梯度惩罚的 NS-GAN 在真实图像数据集上的表现如何？
RQ4结果在合成的低维流形任务与真实世界数据集之间是否一致？

主要发现

即使 JS 散度最小化会失败，NS-GAN 仍能收敛到真实数据流形。
梯度惩罚（GAN-GP 和 DRAGAN-NS）稳定 NS-GAN 的训练，提升收敛性和鲁棒性。
与原始 NS-GAN 相比，带梯度惩罚的 NS-GAN 在若干真实数据集上具有更好的样本质量和多样性。
WGAN-GP 在某些超参数下（如 CelebA）可能无法学习，而带惩罚的 NS-GAN 变体更鲁棒。
区分来自散度与训练动力学的改进表明，梯度惩罚在底层散度视角无论如何都能改善优化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。