QUICK REVIEW

[论文解读] KERMIT: Generative Insertion-Based Modeling for Sequences

William Chan, Nikita Kitaev|arXiv (Cornell University)|Jun 4, 2019

Natural Language Processing Techniques参考文献 22被引用 67

一句话总结

KERMIT 提出了一种统一的基于插入的模型，能够联合学习 p(x,y) 及其边缘分布/条件分布，而无需固定因子分解，从而实现双向翻译、表示学习，以及具有对数时间并行解码的零样本完形填空问答。

ABSTRACT

We present KERMIT, a simple insertion-based approach to generative modeling for sequences and sequence pairs. KERMIT models the joint distribution and its decompositions (i.e., marginals and conditionals) using a single neural network and, unlike much prior work, does not rely on a prespecified factorization of the data distribution. During training, one can feed KERMIT paired data $(x, y)$ to learn the joint distribution $p(x, y)$, and optionally mix in unpaired data $x$ or $y$ to refine the marginals $p(x)$ or $p(y)$. During inference, we have access to the conditionals $p(x \mid y)$ and $p(y \mid x)$ in both directions. We can also sample from the joint distribution or the marginals. The model supports both serial fully autoregressive decoding and parallel partially autoregressive decoding, with the latter exhibiting an empirically logarithmic runtime. We demonstrate through experiments in machine translation, representation learning, and zero-shot cloze question answering that our unified approach is capable of matching or exceeding the performance of dedicated state-of-the-art systems across a wide range of tasks without the need for problem-specific architectural adaptation.

研究动机与目标

推动一个灵活的序列建模框架，该框架不依赖预先指定的从左到右的分解。
在一个统一模型中学习序列及其边缘/条件分布的联合分布。
实现双向生成与填充，包括翻译和完形填空式问答。
在简单的基于Transformer的架构下，在机器翻译、表示学习和零样本问答等任务上展示有竞争力的性能。

提出的方法

通过插入操作来建构任意顺序的画布以表示联合分布 p(x,y) 的序列建模。
通过用 Jensen 不等式对对数似然下界化来训练，采样生成顺序和插入操作。
将内容与位置分解为 p(c,l)=p(c|l)p(l)，并使用一个不带因果遮罩的单一 Transformer 解码器。
实现双向推理（p(y|x) 和 p(x|y)）以及从联合分布和边缘分布中采样。
通过连接 x 和 y 将其扩展为序列对，并训练以学习联合、边缘和条件分解。

实验结果

研究问题

RQ1基于插入的模型是否可以在没有固定分解的情况下学习联合分布 p(x,y) 及其分解？
RQ2单一的统一模型是否在翻译、表示学习和完形填空问答等领域达到或超过最先进的性能？
RQ3相比传统自回归模型，双向生成与边缘精炼对性能与效率的影响？
RQ4在对序列对进行插入操作建模时，推理和采样能力有哪些？

主要发现

KERMIT 在机器翻译、表示学习和零样本完形填空问答等任务上可以达到或超过最先进的性能。
该模型同时支持串行自回归解码和并行部分自回归解码，经验上在序列长度上具有对数时间复杂度。
联合建模并进行边缘精炼（p(x) 和 p(y)）在德语→英语翻译中提升了约1.2 BLEU分，按报道的设置。
双向训练与微调在没有针对特定问题的架构调整的情况下也能提供有竞争力的结果。
基于插入的解码实现输出画布的动态扩展，避免固定长度生成的约束。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。