QUICK REVIEW

[论文解读] Demystifying MMD GANs

Mikołaj Bińkowski, Danica J. Sutherland|arXiv (Cornell University)|Jan 4, 2018

Model Reduction and Neural Networks参考文献 61被引用 99

一句话总结

该论文分析 MMD GAN，展示在固定表示下生成器梯度无偏，但在判别器被学习时梯度有偏，并展示相对于 WGAN-GP 的实际优势。

ABSTRACT

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic. Being an integral probability metric, the MMD benefits from training strategies recently developed for Wasserstein GANs. In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance. We also propose an improved measure of GAN convergence, the Kernel Inception Distance, and show how to use it to dynamically adapt learning rates during GAN training.

研究动机与目标

阐明 MMD GANs 中的梯度偏差，并与 Wasserstein GANs 进行比较。
研究核函数的选择及其对 MMD 判别器的影响。
将能量距离和 Cramér GAN 与 MMD 联系起来，包括梯度方面的考虑。
提出实际的评估指标，如 Kernel Inception Distance（KID）。
展示使用更小的判别器和更快训练的 MMD GAN 的训练优势。

提出的方法

将 MMD 表述为一个 IPM，将 RKHS 中单位球作为证人类。
通过梯度惩罚（类似于 WGAN-GP）对 MMD 判别器进行正则化。
通过核构造将能量距离和 Cramér GAN 与 MMD 联系起来。
将 Kernel Inception Distance (KID) 发展为一个无偏的 GAN 收敛度量。
在标准数据集上对 MMD GAN 与 WGAN-GP 和 Cramér GAN 进行经验比较。

实验结果

研究问题

RQ1当判别器固定时，MMD GAN 的梯度估计是否能产生无偏的生成器梯度；当判别器被学习时呢？
RQ2核函数的选择如何影响 MMD 判别器的性能和训练稳定性？
RQ3能否利用能量距离/ Cramér GAN 的洞见来改进 MMD GAN 和相关的 IPM？
RQ4Kernel Inception Distance 是否是一个可靠且无偏的 GAN 收敛度量？
RQ5MMD GAN 是否能以更小的判别器和更快的训练达到与 WGAN-GP 相当的性能？

主要发现

自然的 MMD 估计在建立在固定的深度表示之上时具有无偏梯度。
学习判别器会使生成器梯度相对于无限样本最优解变得有偏。
MMD GAN 可以以更小的判别网络和更快的训练达到与 WGAN-GP 相当的性能。
能量距离的联系提供一个带梯度惩罚的正则化判别器框架。
提出 Kernel Inception Distance (KID) 作为无偏收敛度量，并有助于学习率自适应。
在标准基准数据集上的实验显示 MMD GAN 在网络规模和训练效率方面相对于 WGAN-GP 的实际优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。