QUICK REVIEW

[论文解读] VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation

Fuli Luo, Wei Wang|arXiv (Cornell University)|May 4, 2021

Natural Language Processing Techniques参考文献 7被引用 31

一句话总结

VECO 提出了一种统一的可变编码器-解码器预训练框架，通过序列内和跨序列的掩码语言建模，在理解与生成任务之间共享子模块。该方法在 XTREME 基准任务上取得了最先进性能，并在 WMT14 上将翻译 BLEU 分数提升了最高 1–2 分。

ABSTRACT

Recent studies about learning multilingual representations have achieved significant performance gains across a wide range of downstream cross-lingual tasks. They train either an encoder-only Transformer mainly for understanding tasks, or an encoder-decoder Transformer specifically for generation tasks, ignoring the correlation between the two tasks and frameworks. In contrast, this paper presents a variable encoder-decoder (VECO) pre-training approach to unify the two mainstreams in both model architectures and pre-training tasks. VECO splits the standard Transformer block into several sub-modules trained with both inner-sequence and cross-sequence masked language modeling, and correspondingly reorganizes certain sub-modules for understanding and generation tasks during inference. Such a workflow not only ensures to train the most streamlined parameters necessary for two kinds of tasks, but also enables them to boost each other via sharing common sub-modules. As a result, VECO delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark covering text classification, sequence labeling, question answering, and sentence retrieval. For generation tasks, VECO also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1~2 BLEU.

研究动机与目标

统一编码器-only 与编码器-解码器预训练框架，以支持多语言任务。
解决现有多语言表示学习中理解与生成模型之间相关性不足的问题。
通过参数共享与联合预训练，提升在多语言理解与生成任务上的性能。
设计一种灵活的推理机制，重新组织共享子模块以适配特定任务。

提出的方法

VECO 将标准 Transformer 块拆分为子模块，并通过序列内与跨序列掩码语言建模进行训练。
在推理阶段通过重新组织共享子模块，复用其以处理理解或生成任务。
在统一架构内，同时对序列到序列与序列到单序列目标进行预训练。
通过仅训练每个任务所需的最小必要组件，实现参数效率。
通过在同语言与跨语言序列上进行掩码语言建模，增强多语言迁移能力。
在推理过程中，根据任务类型动态选择并激活子模块。

实验结果

研究问题

RQ1统一的模型架构是否能同时提升多语言理解与生成任务的性能？
RQ2在编码器与解码器组件之间共享子模块，对模型效率与性能有何影响？
RQ3通过序列内与跨序列掩码进行联合预训练，能在多大程度上增强多语言迁移能力？
RQ4该可变架构是否在标准基准上优于专用的编码器-only 或编码器-解码器模型？
RQ5共享子模块是否能在不损害任一能力的前提下，同时提升理解与生成性能？

主要发现

VECO 在 XTREME 基准的所有任务上均取得了新的最先进结果，包括文本分类、序列标注、问答与句子检索。
与现有多语言模型相比，该模型在 WMT14 英语到德语与英语到法语翻译任务上的性能提升了最高 1–2 BLEU 分。
统一架构在减少参数冗余的同时，保持了在多样化多语言任务上的高性能。
共享子模块实现了理解与生成能力之间的相互促进。
该方法在低资源与高资源语言对上均表现出强大的泛化能力。
消融实验确认，序列内与跨序列掩码均对性能提升有显著贡献。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。