QUICK REVIEW

[论文解读] Energy-based Generative Adversarial Network

Junbo Zhao, Michaël Mathieu|arXiv (Cornell University)|Sep 11, 2016

Generative Adversarial Networks and Image Synthesis参考文献 23被引用 893

一句话总结

EBGAN 重新将 GAN 判别器诠释为能量函数，利用自编码器基础的能量量度并提高训练稳定性，并展示了高分辨率图像生成。

ABSTRACT

We introduce the "Energy-based Generative Adversarial Network" model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign high energies to these generated samples. Viewing the discriminator as an energy function allows to use a wide variety of architectures and loss functionals in addition to the usual binary classifier with logistic output. Among them, we show one instantiation of EBGAN framework as using an auto-encoder architecture, with the energy being the reconstruction error, in place of the discriminator. We show that this form of EBGAN exhibits more stable behavior than regular GANs during training. We also show that a single-scale architecture can be trained to generate high-resolution images.

研究动机与目标

引入一种以判别器输出能量分数而非概率的能量化 GANs 的形式化
在该框架下，生成器在数据流形之外的能量区域上最小化能量，而判别器对真实数据赋予低能量
证明使用自编码器作为判别器可实现稳定训练并在不需要多尺度设置的情况下生成高分辨率图像
提供在简单的铰链损失下生成器与数据分布达到平衡时的均衡条件的理论结果
研究正则化技术（如排斥正则项）以鼓励覆盖多种数据模态

提出的方法

将 D 定义为能量函数，并使用边际损失进行训练，使真实数据的 D(x) 低于边距 m，而 D(G(z)) 高于边距 m
使用生成器损失 L_G(z)=D(G(z)) 和判别器损失 L_D(x,z)=D(x)+[m−D(G(z))]^+ 其中 [·]^+ 为铰链函数
将 D 实例化为自编码器，其能量等于重构误差 ||Dec(Enc(x))−x||
论证能量基础的框架允许比二元分类器更灵活的架构与损失函数
引入排斥正则化项（Pulling-away Term）以促进多样化表示并缓解模式崩塌
讨论对自编码器的正则化以避免恒等映射并确保在数据流形外能量更高

实验结果

研究问题

RQ1一个带自编码器判别器的能量化 GAN（EBGAN）框架是否在 Nash 均衡下收敛到数据分布？
RQ2边际损失如何影响 EBGAN 的均衡与训练稳定性？
RQ3排斥正则化是否能改善模式覆盖和生成样本的多样性？
RQ4是否可以使用单尺度的 EBGAN 通过自编码器判别器生成高分辨率图像？
RQ5 architectural 选择在 EBGAN 相对于传统 GAN 的稳定性与质量上有何影响？

主要发现

在简单的铰链损失下，Nash 均衡得到 p_G = p_data，意味着生成样本与数据分布匹配
自编码器判别器提供了灵活的能量表面，能够比标准 GAN 判别器带来更稳定的训练
EBGAN 可以在 ImageNet 上生成高分辨率图像（256×256），且无需多尺度架构
类似排斥项的正则化有助于提高样本多样性并覆盖多种数据模态
EBGAN 在 MNIST 网格搜索中相比 GAN 显示出更可靠的训练，并能通过 Ladder Networks 实现半监督扩展
将 EBGAN 与深度卷积结构结合，可在 LSUN Bedroom 和 CelebA 数据集上实现更真实的生成

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。