QUICK REVIEW

[论文解读] BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Michael Lewis, Yinhan Liu|arXiv (Cornell University)|Oct 29, 2019

Topic Modeling被引用 235

一句话总结

BART 是一个去噪自编码器预训练框架，结合双向编码与自回归解码，在生成与理解任务上实现强大性能，并在判别任务上取得具有竞争力的结果。

ABSTRACT

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.

研究动机与目标

提出一个多功能的预训练目标，既支持生成任务也支持理解任务。
探索广泛的文本污染（noising）方案，并确定哪些方案能够带来稳健的下游表现。
展示如何用一个单一的预训练模型进行微调以适应多种任务（分类、问答、生成、翻译）。
证明去噪的 seq2seq 预训练在多个基准上可以达到甚至超过现有的强预训练方法。

提出的方法

使用基于 Transformer 的标准 seq2seq 架构，具有双向编码器和自回归解码器。
通过对文档进行任意噪声处理来预训练，并训练以重构原始文本（负对数似然）。
评估包括标记掩码、标记删除、文本填充、句子置换、文档旋转及它们组合在内的多种噪声方案。
在任务特定的适配下进行序列分类、标记分类、序列生成和机器翻译的微调。
对于翻译，在 BART 上增加一个小型额外编码器，将外语词映射到英语，并使用单独的词汇表进行端到端训练。

实验结果

研究问题

RQ1一个在被污染文本上运行的去噪自编码器预训练目标是否能够在生成与理解任务之间实现泛化？
RQ2哪些噪声方案能够在多样化的 NLP 基准测试中产生稳健的端任务表现？
RQ3在判别和生成任务上，BART 与现有的预训练方法（例如 BERT、RoBERTa、XLNet）相比如何？
RQ4在作为解码器并添加一个编码器时，单一的预训练模型是否能够改进机器翻译？
RQ5消融研究揭示了不同预训练目标对下游性能的贡献？

主要发现

BART 在判别任务上在 GLUE 和 SQuAD 上达到类似 RoBERTa 的性能，同时在摘要式任务上达到最先进的结果。
文本填充及相关的噪声化方案在多任务上都表现出稳定的强劲性能，在许多场景中超过其他预训练目标。
在摘要任务上，BART 大幅超越以往工作，在如 XSum 这样的摘要数据集上获得显著提升。
在翻译任务中，使用 BART 作为预训练解码器（并添加一个小型编码器）相较于强的反向翻译基线获得 BLEU 增长。
消融研究表明，预处理选择和预训练目标会影响下游任务表现，双向编码器和自回归解码器在生成任务中具有优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。