QUICK REVIEW

[论文解读] Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Biao Zhang, Philip Williams|arXiv (Cornell University)|Apr 24, 2020

Natural Language Processing Techniques参考文献 35被引用 82

一句话总结

该论文通过语言感知组件、更深的架构以及 Random Online Backtranslation (ROBt) 大幅提升极大规模多语言神经机器翻译的能力，使其与双语模型和基于枢轴的方法的性能更接近，并显著提升零样本翻译。

ABSTRACT

Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both one-to-many and many-to-many settings, and improves zero-shot performance by ~10 BLEU, approaching conventional pivot-based methods.

研究动机与目标

激发并解决在处理多样化语言时极大规模多语言 NMT 的容量瓶颈。
提升零样本翻译质量并减少错译到其他语言的翻译。
探索提升多语言翻译性能的架构和数据驱动策略。
评估语言感知组件和深层 Transformer 架构的有效性。
提出并评估一种可扩展的基于回译的微调方法，用于零-shot 方向。

提出的方法

采用基于 Transformer 的多语言 NMT，使用以英语为中心的数据（OPUS-100）。
引入语言感知层归一化（LaLn），以根据目标语言标记条件化归一化。
在编码器和解码器之间引入语言感知线性变换（LaLt），以根据目标语言适配翻译映射。
加深 Transformer 架构以提升建模容量。
开发 Random Online Backtranslation (ROBt)：在微调期间对零-shot 方向进行在线回译，使用随机采样的中间语言生成伪并行数据。
在 OPUS-100 上评估一对多和多对多设定，并报告零-shot 翻译的 BLEU 和翻译语言准确度。

实验结果

研究问题

RQ1增加建模容量如何影响跨多对多语言方向的多语言 NMT 性能？
RQ2语言感知归一化和语言感知线性变换是否能缓解容量瓶颈并提升零-shot 翻译？
RQ3在线回译（ROBt）是否能减少错译到其他语言的情况并提升零-shot BLEU，接近枢轴法？
RQ4在性能和可扩展性方面，深层 Transformer 架构相较于语言感知组件在极大规模多语言 NMT 中表现如何？
RQ5跨语言对的训练数据规模对这些方法的有效性有何影响？

主要发现

随着模型容量的增加，多语言 NMT 显著提升，缩小与双语模型的差距，尤其是对低资源语言。
语言感知建模（LaLn 与 LaLt）显著提升零-shot 性能并减少错译，LaLt 带来显著增益。
深化 Transformer 架构带来收益，与 LaLn 和 LaLt 结合时，在对比消融中获得最佳结果。
Random Online Backtranslation (ROBt) 将错译到其他语言的情况降低约 50%，零-shot BLEU 提升约 10 点，接近枢轴方法，并在数千步内收敛。
在 OPUS-100（100 种语言，5500 万句对）上，深层 Transformer、LaLn、LaLt 和 ROBt 的组合缩小了与双语 NMT 和枢轴方法的差距，同时实现了显著的零-shot 提升。
零-shot 翻译准确度（ACC zero）在 ROBt 下从大约 35–50% 的增益提升到约 85–87%，零-shot BLEU 增益高达 10.11。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。