[论文解读] DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
ΔLM 重用一个预训练的多语言编码器来初始化编码器-解码器模型的编码器和解码器,并在单语和双语数据上通过区间破坏和翻译区间破坏对其进行预训练,以提升多语言生成与翻译。
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM, a pretrained multilingual encoder-decoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. To take advantage of both the large-scale monolingual data and bilingual data, we adopt the span corruption and translation span corruption as the pre-training tasks. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks, including machine translation, abstractive text summarization, data-to-text, and question generation. The code and pretrained models are available at \url{https://aka.ms/deltalm}.
研究动机与目标
- 激励弥合预训练编码器与依赖编码器-解码器框架的自然语言生成(NLG)任务之间的差距。
- 提出一种方法,复用预训练的多语言编码器来同时初始化编码器和解码器的编码-解码器模型。
- 通过专门的预训练任务利用大规模单语和双语数据,以提升跨语言迁移能力。
- 在多语言生成与翻译基准上展示在 NLG、机器翻译、摘要、数据到文本和问题生成等领域的有效性。
提出的方法
- 将预训练的多语言编码器-解码器模型的编码器和解码器从一个强大的多语言编码器(InfoXLM)初始化。
- 引入一个交错的 Transformer 解码器以使解码器结构与编码器对齐并实现对预训练权重的完全复用。
- 在多语言数据上进行区间破坏预训练,以维持跨语言迁移性。
- 通过使用双语平行数据的翻译区间破坏来增强跨语言迁移。
- 使用6TB多语言语料库(100种语言)以及88GB的双语数据(77种语言),基于360M参数的基础模型。
- 在下游任务上使用标准的优化和评估设置进行微调;在微调阶段进行零-shot迁移实验,使用混合的预训练目标。
实验结果
研究问题
- RQ1重用预训练的多语言编码器来初始化编码器-解码器模型,是否能提升NLG与翻译任务?
- RQ2交错解码器是否使预训练的编码器权重得以充分利用,从而获得更好的跨语言生成?
- RQ3区间破坏与翻译区间破坏任务是否能有效利用单语和双语数据用于多语言NLG与MT?
- RQ4相较于强基线,ΔLM 在多语言生成、跨语言生成和零-shot迁移上的表现如何?
主要发现
- ΔLM with 360M parameters outperforms XLM and XNLG on XQG-Zh and XGiga-Fr in BLEU, METEOR, and ROUGE-L.
- ΔLM achieves +2.7 average BLEU improvements over multilingual NMT baselines on X→En test sets and +1.3 over En→X.
- ΔLM outperforms mBART and M2M-100 across 10 languages in X→En and En→X directions with fewer parameters.
- On cross-lingual abstractive summarization and data-to-text, ΔLM matches or exceeds baselines such as mBART and mT5 while being more parameter-efficient (360M vs up to 3.7B in some baselines).
- In zero-shot cross-lingual transfer for XGiga, ΔLM significantly outperforms XLM, XLM+MT, and XNLG on French and Chinese test sets.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。