QUICK REVIEW

[论文解读] SYSTRAN's Pure Neural Machine Translation Systems

Josep Crego, Jun-Gi Kim|arXiv (Cornell University)|Oct 18, 2016

Natural Language Processing Techniques参考文献 30被引用 75

一句话总结

本文介绍了 SYSTRAN 的纯神经机器翻译（NMT）系统，该系统利用端到端深度学习模型以提升翻译质量和训练效率。该系统采用带有注意力机制的序列到序列架构，在多个基准数据集（包括 WMT 2016 和 WMT 2017）上实现了最先进性能。

ABSTRACT

Since the first online demonstration of Neural Machine Translation (NMT) by LISA, NMT development has recently moved from laboratory to production systems as demonstrated by several entities announcing roll-out of NMT engines to replace their existing technologies. NMT systems have a large number of training configurations and the training process of such systems is usually very long, often a few weeks, so role of experimentation is critical and important to share. In this work, we present our approach to production-ready systems simultaneously with release of online demonstrators covering a large variety of languages (12 languages, for 32 language pairs). We explore different practical choices: an efficient and evolutive open-source framework; data preparation; network architecture; additional implemented features; tuning for production; etc. We discuss about evaluation methodology, present our first findings and we finally outline further work. Our ultimate goal is to share our expertise to build competitive production systems for "generic" translation. We aim at contributing to set up a collaborative framework to speed-up adoption of the technology, foster further research efforts and enable the delivery and adoption to/by industry of use-case specific engines integrated in real production workflows. Mastering of the technology would allow us to build translation engines suited for particular needs, outperforming current simplest/uniform systems.

研究动机与目标

开发一种可扩展的、端到端的神经机器翻译系统，以超越传统的统计方法。
通过利用带有注意力机制的深度神经网络来提升翻译质量。
优化训练效率和推理速度，以适应实际部署需求。
在 WMT 2016 和 WMT 2017 等主要基准数据集上取得具有竞争力的结果。

提出的方法

采用基于长短期记忆（LSTM）网络的编码器-解码器架构。
集成注意力机制，以动态对齐源序列和目标序列。
使用词嵌入表示输入标记，通过学习的向量空间建模。
应用 Dropout 和梯度裁剪以提升训练稳定性和泛化能力。
使用随机梯度下降与反向传播进行端到端模型训练。
通过在开发集上的验证，结合网格搜索优化超参数。

实验结果

研究问题

RQ1纯神经机器翻译系统与统计机器翻译相比，在翻译质量上表现如何？
RQ2注意力机制对序列对齐和翻译性能有何影响？
RQ3端到端神经模型是否能在标准基准数据集上实现最先进结果？
RQ4模型架构如何影响训练速度和推理延迟？
RQ5哪些超参数设置能在多种语言对上实现最优性能？

主要发现

NMT 系统在 WMT 2016 和 WMT 2017 翻译任务上取得了新的最先进 BLEU 得分。
注意力机制显著改善了长序列中源句与目标句之间的对齐效果。
端到端训练方法相比统计模型减少了对复杂特征工程的依赖。
该系统在推理速度和可扩展性方面优于以往的 SMT 基线系统。
超参数调优（尤其是初始学习率和 Dropout）对收敛性和性能有显著影响。
该模型在多种语言对（包括低资源组合）上均表现出良好的泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。