QUICK REVIEW

[论文解读] A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation

Surafel M. Lakew, Mauro Cettolo|arXiv (Cornell University)|Jun 18, 2018

Natural Language Processing Techniques参考文献 27被引用 63

一句话总结

本文在双语、多语、以及零-shot 多语翻译设置下定量比较 Transformer 与 Recurrent NMT 架构，并使用后编辑和详细错误类别分析相关语言对与不相关语言对。

ABSTRACT

Recently, neural machine translation (NMT) has been extended to multilinguality, that is to handle more than one translation direction with a single system. Multilingual NMT showed competitive performance against pure bilingual systems. Notably, in low-resource settings, it proved to work effectively and efficiently, thanks to shared representation space that is forced across languages and induces a sort of transfer-learning. Furthermore, multilingual NMT enables so-called zero-shot inference across language pairs never seen at training time. Despite the increasing interest in this framework, an in-depth analysis of what a multilingual NMT model is capable of and what it is not is still missing. Motivated by this, our work (i) provides a quantitative and comparative analysis of the translations produced by bilingual, multilingual and zero-shot systems; (ii) investigates the translation quality of two of the currently dominant neural architectures in MT, which are the Recurrent and the Transformer ones; and (iii) quantitatively explores how the closeness between languages influences the zero-shot translation. Our analysis leverages multiple professional post-edits of automatic translations by several different systems and focuses both on automatic standard metrics (BLEU and TER) and on widely used error categories, which are lexical, morphology, and word order errors.

研究动机与目标

评估双语、多人语种以及零-shot MT 系统之间的翻译质量差异。
在多语言 MT 设置中评估 Recurrent 与 Transformer 架构。
研究相关语言数据如何影响零-shot 翻译性能。
分析跨架构和语言关系的词汇、形态及词序错误模式。

提出的方法

使用 Recurrent (LSTM) 和 Transformer 架构实现双语 (NMT)、多语 (M-NMT) 和零-shot (ZST) MT 设置。
对七种语言进行共享 BPE（8,000 次合并规则）和语言标记令牌的多语言模型预处理。
在低资源条件下调整超参数训练模型，并对 RNN 使用 OpenNMT-py、对 Transformer 使用 Tensor2Tensor。
使用官方测试参照物评估 BLEU 与 TER，同时基于九份专业后编辑计算 mTER 与 lmmTER。
通过词形还原和词性标注输出进行细粒度错误分析，以对词汇、形态和重排序错误进行分类。

实验结果

研究问题

RQ1双语、多语和零-shot 系统在整体翻译质量和特定错误类型上有何差异？
RQ2在不同任务中，Recurrent 与 Transformer 架构的翻译质量有何差异？
RQ3将相关语言数据纳入对零-shot 翻译性能有何影响？
RQ4相关语言数据对零-shot 翻译在 Transformer 还是 Recurrent 模型中的提升更明显？

主要发现

Transformer 在双语、多人语以及零-shot 设置下始终获得更高的 BLEU、更低的 TER，相较于 Recurrent 在多人语和零-shot情形下具有统计显著的提升。
多语言模型（M-NMT）在若干情形下优于双语 NMT，并且由于更广泛的语言暴露，在 mTER 与 lmmTER 方面表现稳健。
零-shot 翻译是可行的，尤其是使用 Transformer 架构时，在某些零-shot 配置下甚至可能超越双语基线。
在相关语言方向，当加入额外的相关语言时，零-shot 性能提高（ZST_B），且 Transformer 的零-shot 模型在词汇错误方面显示显著改进。
错误分析显示词汇错误占主导，形态和重排错误贡献较小；与双语基线相比，基于 Transformer 的 ZST 模型在错误方面实现了有意义的改进。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。