QUICK REVIEW

[论文解读] How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?

Ferenc Huszár|arXiv (Cornell University)|Nov 16, 2015

Topic Modeling参考文献 21被引用 204

一句话总结

本文批判了调度采样作为生成模型训练方法的不一致性，认为最大似然训练因过度泛化而导致生成样本质量低下。文章提出了一种广义的 Jensen-Shannon 散度作为更合理的损失函数，该函数在最大似然与一种感知质量更优的理想化目标之间插值，解释了为何对抗训练能生成更高质量的样本。

ABSTRACT

Modern applications and progress in deep learning research have created renewed interest for generative models of text and of images. However, even today it is unclear what objective functions one should use to train and evaluate these models. In this paper we present two contributions. Firstly, we present a critique of scheduled sampling, a state-of-the-art training method that contributed to the winning entry to the MSCOCO image captioning benchmark in 2015. Here we show that despite this impressive empirical performance, the objective function underlying scheduled sampling is improper and leads to an inconsistent learning algorithm. Secondly, we revisit the problems that scheduled sampling was meant to address, and present an alternative interpretation. We argue that maximum likelihood is an inappropriate training objective when the end-goal is to generate natural-looking samples. We go on to derive an ideal objective function to use in this situation instead. We introduce a generalisation of adversarial training, and show how such method can interpolate between maximum likelihood training and our ideal training objective. To our knowledge this is the first theoretical analysis that explains why adversarial training tends to produce samples with higher perceived quality.

研究动机与目标

识别调度采样作为自回归序列模型训练目标时的根本缺陷。
挑战将最大似然作为主要训练目标的做法，当目标是生成真实、自然外观的样本时。
提出一个理论基础坚实的替代目标函数，使其更符合生成样本的感知质量。
通过将对抗训练视为最小化在 KL[P||Q] 与 KL[Q||P] 之间插值的广义散度，解释为何对抗训练能生成更高质量的样本。

提出的方法

将调度采样的目标重新表述为 Kullback-Leibler 散度的形式，揭示其为一种不恰当且不一致的训练过程。
提出最小化反向 Kullback-Leibler 散度 KL[Q||P] 作为感知质量的理想化目标，尽管其在实践中难以计算。
引入一种广义的 Jensen-Shannon 散度（JS_π），通过超参数 π 插值于 KL[P||Q]（最大似然）与 KL[Q||P]（感知质量）之间。
表明可通过调整判别器训练数据中的类别平衡（π）来近似 JS_π 的对抗训练。
证明标准 GAN 对应于 π = 0.5 时的 JS_π，而其他 π 值可实现一系列不同的训练行为。
为对抗训练为何能提升样本质量提供理论依据：其近似了更接近感知理想目标的散度。

实验结果

研究问题

RQ1为何最大似然训练在自回归序列模型中会产生不真实或不合逻辑的样本？
RQ2调度采样是否是一种一致的训练方法？它是否真正解决了最大似然训练的问题？
RQ3何种目标函数比最大似然更能反映生成自然外观样本的目标？
RQ4如何从理论上解释对抗训练为何能生成更高质量的样本？
RQ5能否构建一个统一的框架，实现从最大似然到感知驱动目标之间的插值？

主要发现

调度采样被证明是一种不一致的训练方法，因其底层目标函数不恰当，尽管其在 MSCOCO 等基准测试中表现出 empirically 成功。
最大似然训练最小化 KL[P||Q]，导致模式崩溃和过度泛化，生成的样本虽在统计上合理，但感知上不真实。
最小化 KL[Q||P] 在理论上是感知质量的理想选择，但其在实践中不可行且无法直接使用。
广义的 Jensen-Shannon 散度 JS_π 提供了一个可计算的目标函数，可平滑地在最大似然（π → 0）与 KL[Q||P]（π → 1）之间插值，从而实现多样化的训练行为。
使用平衡判别器（π = 0.5）的对抗训练近似于标准 JS 散度，而调整 π 可使方法向感知质量目标偏移。
理论分析解释了为何对抗训练能生成更高质量的样本：其近似了一种更注重模式覆盖与感知真实性的散度，而不仅限于统计匹配。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。