QUICK REVIEW

[论文解读] Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Melvin Johnson, Mike Schuster|arXiv (Cornell University)|Nov 14, 2016

Natural Language Processing Techniques参考文献 6被引用 108

一句话总结

本文提出一个单一的多语言NMT模型，通过在输入前置入目标语言标记来实现多语言之间的翻译，从而在共享词汇表和架构的条件下实现零样本翻译和迁移学习。

ABSTRACT

We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. Our method often improves the translation quality of all involved language pairs, even while keeping the total number of model parameters constant. On the WMT'14 benchmarks, a single multilingual model achieves comparable performance for English$ ightarrow$French and surpasses state-of-the-art results for English$ ightarrow$German. Similarly, a single multilingual model surpasses state-of-the-art results for French$ ightarrow$English and German$ ightarrow$English on WMT'14 and WMT'15 benchmarks respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

研究动机与目标

在不改变标准NMT架构的前提下，提出一种简单、可扩展的多语言翻译方法。
证明共享的wordpiece词汇表和目标语言标记可以在一个模型中支持多语言对。
表明多语言训练可以提升低资源语言并实现零样本翻译。
在WMT基准测试和大规模生产数据上评估该方法，以评估翻译质量和迁移收益。
探讨多语言NMT中隐式中介语言表示和跨语言迁移的潜力。

提出的方法

在输入开头引入一个人工标记以指定目标语言（如 <2es> 表示西班牙语）。
使用一个单一的共享编码器-解码器-注意力NMT架构，在所有语言之间共享wordpiece词汇表（通常为32k 个词片（pieces））。
在混合的多语言数据上进行训练，通过过采样/欠采样来平衡语言对，同时保持总参数量不变。
在多对一、一对多和多对多语言映射下进行实验，以评估不同配置下的性能。
使用WMT14/15基准的分词BLEU以及大规模生产数据进行评估，并分析零样本翻译能力。

实验结果

研究问题

RQ1一个单一的NMT模型是否可以在不进行架构变动的情况下实现多语言间的翻译？
RQ2引入目标语言标记是否能够实现多语言翻译，并且它如何影响各语言对的性能？
RQ3多语言训练是否能提供零样本翻译并对低资源语言带来迁移学习的收益？
RQ4相比分开训练的单语言模型，模型规模和数据平衡如何影响多语言翻译质量？
RQ5在多语言NMT中对隐式中介语言表示的证据有哪些？

主要发现

在不同采样策略下，带有目标语言标记的单一模型可以在若干语言对（例如法语-英语、德语-英语等）上达到甚至超过单语基线。
对训练中未出现的语言对（例如葡萄牙语→西班牙语）实现了零样本翻译，BLEU分数可通过增加数据或增量训练而提升。
多语言模型通过共享表示提升低资源语言，并且在同一总参数预算下训练多个语言对时也能获得竞争性结果。
大规模多语言模型（多达12个语言对）在提供竞争性性能的同时，大幅降低训练时间和生产复杂度（大约为训练时间的1/12）。
在某些情况下，通过多语言训练实现的隐式桥接可以优于显式桥接，另外为零样本方向添加有限的平行数据也可进一步提高质量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。