[论文解读] Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
GNMT 是一个带深度 LSTM 编码器/解码器、残差连接、wordpiece 子词单元,以及量化推理的生产级 NMT 系统,在 BLEU 分数上具竞争力,并在人工评估方面显著优于基于短语的系统。
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
研究动机与目标
- 解决早期神经机器翻译的弱点:训练/推理速度、罕见词处理,以及源-目标覆盖不完整。
- 开发具深度 LSTM 层和残差连接的生产就绪 GNMT 系统,以提升准确性和训练速度。
- 通过子词单元(wordpieces)改进对罕见词的处理,并配以带长度归一化和覆盖惩罚的鲁棒束搜索。
- 通过模型/量化技术与硬件(TPU)优化提升推理速度。
- 在标准基准上展示具竞争力的性能并在人工评估方面相对于短语基系统获得显著提升。
提出的方法
- 采用带残差连接的8层编码器和8层解码器的深度 LSTM 堆栈。
- 应用一个双向底层编码器层,在最大化上下文的同时维持并行性。
- 引入跨源语言与目标语言共享的 wordpiece 子词单元,以处理罕见词。
- 实现带长度归一化和覆盖惩罚的束搜索。
- 使用最大似然目标进行训练,并在需要时结合混合 ML 和基于奖励的目标进行微调(基于 GLEU 的奖励)。
- 使用8位权重和16位累加器的量化推理,在专用硬件上加速解码。
实验结果
研究问题
- RQ1具备子词单元的深度、残差连接的 NMT 模型是否能在标准基准上接近人类翻译质量?
- RQ2量化推理是否能在不牺牲翻译质量的情况下实现生产级加速?
- RQ3联合 wordpiece 表示是否比纯词或字符基线在处理罕见词和整体 BLEU 分数方面更优?
- RQ4在束搜索中加入覆盖惩罚和长度归一化是否提升跨语言的翻译完整性?
主要发现
- 在 WMT’14 English→French 上,GNMT 单模型达到 38.95 BLEU,相较不使用外部对齐模型的基线高出 7.5 BLEU,相较另一个基线高出 1.2 BLEU。
- 在 WMT’14 English→German 上,GNMT 得分 24.17 BLEU,比一个有竞争力的前基线高出约 3.4 BLEU。
- 在人工逐对评估中,GNMT 相比 Google 的基于短语的生产系统,在 English↔French、English↔Spanish、English↔Chinese 对上将翻译错误减少 60%。
- 生产数据结果表明,在选定语言对上,其质量接近平均人类译者水平(据论文所述)。
- 基于 wordpiece 的模型在词汇灵活性与解码效率之间取得了有效平衡,相对于纯词或字符方法提高了 BLEU。
- 量化推理实现了更快的解码,翻译质量几乎没有损失,通过在 CPU、GPU、TPU 部署中的比较得到证明。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。