QUICK REVIEW

[论文解读] Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau|arXiv (Cornell University)|Sep 1, 2014

Natural Language Processing Techniques参考文献 22被引用 14,567

一句话总结

本文提出一种基于注意力的神经机器翻译模型（RNNsearch），能够联合学习对齐与翻译，用来自双向注释的逐步上下文向量代替单一固定长度向量，在不依赖一个单一的基于短语的系统的情况下实现了英法翻译的竞争力。

ABSTRACT

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

研究动机与目标

动机：克服在编码-解码 NMT 架构中将整句源文本编码为一个固定长度向量的瓶颈。
引入一种模型，在解码过程中对源位置进行动态关注（软对齐）。
表明联合学习的对齐能提升翻译质量，尤其是对较长句子。
展示在英法翻译上，使用单一模型对比基于短语的系统也具备竞争力。

提出的方法

使用双向 RNN 编码器为每个源词标注来自两个方向的上下文。
实现一个解码器，在每一步将上下文向量定义为所有注释的加权和，权重由对齐模型决定。
将 p(y_i|y_1..y_{i-1}, x) 定义为对每个目标词具有唯一上下文 c_i，从而实现对源的软注意。
对整个模型进行端到端训练，最大化条件似然 p(y|x)，并通过注意力机制进行反向传播。
使用神经网络作为对齐模型 a(s_{i-1}, h_j)，通过对 j 进行 softmax 得到注意权重 α_{ij}。
在 RNN 中使用 maxout 网络和门控循环单元，并以小批量的 SGD/Adadelta 进行训练。

实验结果

研究问题

RQ1用动态注意力机制替代固定长度上下文向量，是否能提升翻译质量？
RQ2对齐与翻译的联合学习能否产生符合语言学直觉的可信软对齐？
RQ3在英法翻译任务中，相较于编码器-解码器基线和传统的基于短语的系统，注意力模型的表现如何，尤其是在更长的句子上？

主要发现

模型	全部	无UNK
RNNencdec-30	13.93	24.19
RNNsearch-30	21.50	31.44
RNNencdec-50	17.82	26.71
RNNsearch-50	26.75	34.16
RNNsearch-50 ⋆	28.45	36.15
Moses	33.30	35.63

所提出的 RNNsearch 在所有设置下均优于传统的 RNN 编码器-解码器。
RNNsearch-50 在仅包含已知词的句子上，其 BLEU 分数与 Moses（基于短语的系统）相当。
注意力机制对句子长度具有更好的鲁棒性，RNNsearch-50 对长句没有性能下降。
定性分析表明，该模型发现源词与目标词之间的有意义的软对齐，符合语言学直觉。
长句揭示了固定向量编码器与基于注意力的模型之间更大的性能差距，RNNsearch 在 RNNencdec 退化时仍能维持翻译质量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。