QUICK REVIEW

[论文解读] Maximum Entropy Generators for Energy-Based Models

Rithesh Kumar, Ozair, Sherjil|arXiv (Cornell University)|Jan 24, 2019

Anomaly Detection Techniques and Applications参考文献 46被引用 62

一句话总结

MEG 同时训练一个能量函数和一个可摊销的神经生成器，以逼近对数似然梯度，利用生成器输出的熵最大化以及梯度惩罚来提高稳定性。它能够生成清晰的图像样本，支持完整的模态覆盖，并展示出具有竞争力的异常检测性能。

ABSTRACT

Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient. In this work, we propose learning both the energy function and an amortized approximate sampling mechanism using a neural generator network, which provides an efficient approximation of the log-likelihood gradient. The resulting objective requires maximizing entropy of the generated samples, which we perform using recently proposed nonparametric mutual information estimators. Finally, to stabilize the resulting adversarial game, we use a zero-centered gradient penalty derived as a necessary condition from the score matching literature. The proposed technique can generate sharp images with Inception and FID scores competitive with recent GAN techniques, does not suffer from mode collapse, and is competitive with state-of-the-art anomaly detection techniques.

研究动机与目标

动机并解决由于对数似然梯度不可求而导致的训练基于能量的模型（EBMs）困难。
提出一个联合框架，用于学习一个能量函数和一个近似 p_theta 的摊销采样器。
最大化生成器输出的熵，以使 p_G 与 p_theta 对齐并稳定训练。
利用非参数互信息估计器来最大化生成器熵。
证明与基线相比，MEG 可提升图像清晰度、模态覆盖率和异常检测性能。

提出的方法

用神经生成器 G(w) 产生来自潜在先验 z~p_z 的样本 G(z) 来替代模型分布 p_theta。
最小化 KL(p_G||p_theta)，这导出一个生成器损失，结合生成器熵和能量评估： L_G = -I_JSD(G(Z),Z) + E_theta[E(G(Z))].
通过梯度训练能量函数 E_theta：L_E = E_{x~p_D}[E_theta(x)] - E_{z~p_z}[E_theta(G(z))].
通过非参数互信息估计来最大化生成器熵，使用 I_JSD(G(Z),Z) 如 Belghazi et al. (2018) 及相关工作所述。
通过一个零均值梯度惩罚（受 score matching 启发）来稳定训练，以使样本保持在能量函数临界点附近。
可选地执行潜在空间的 MCMC（MALA 搭配 Metropolis-Hastings 步骤），以将采样偏向潜在空间中的高密度区域。

实验结果

研究问题

RQ1摊销神经采样器是否能够在不依赖数据空间昂贵的 MCMC 的情况下近似能量基模型的负相梯度？
RQ2最大化生成器输出的熵是否有助于防止模式崩溃并确保覆盖所有数据模态？
RQ3MEG 框架是否能够生成具有竞争力的图像样本并保持清晰度，同时不牺牲模式多样性？
RQ4学习到的能量函数是否在标准基准上对异常检测有用？

主要发现

MEG 生成的 CIFAR-10 样本在 Inception Score 和 Fréchet Inception Distance 下与 WGAN-GP 相当。
MEG 在 4-StackedMNIST 上准确捕捉全部模态，其 KL 发散低于若干基线。
潜在空间的 MALA 采样在感知样本质量上优于可见空间的 MCMC。
MEG 在 KDD99 上实现强劲的异常检测性能，媲美最先进方法。
相较于典型的最大似然 EBMs，MEG 产生更清晰、不那么模糊的样本，并避免模式崩溃。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。