QUICK REVIEW

[论文解读] Syntax-Directed Attention for Neural Machine Translation

Kehai Chen, Rui Wang|arXiv (Cornell University)|Nov 12, 2017

Natural Language Processing Techniques被引用 58

一句话总结

本文介绍语法导向注意力（SDAtt），在语法距离约束下扩展局部注意力，并提出将全局上下文与语法导向的局部上下文相结合的双上下文NMT 架构，在中文-英文和英文-德文翻译中取得改进。

ABSTRACT

Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the aligned source position and neglect syntax-directed distance constraints. In this paper, we extend local attention with syntax-distance constraint, to focus on syntactically related source words with the predicted target word, thus learning a more effective context vector for word prediction. Moreover, we further propose a double context NMT architecture, which consists of a global context vector and a syntax-directed context vector over the global attention, to provide more translation performance for NMT from source representation. The experiments on the large-scale Chinese-to-English and English-to-Germen translation tasks show that the proposed approach achieves a substantial and significant improvement over the baseline system.

研究动机与目标

通过将语法距离约束纳入NMT，动机与线性距离局部注意力的局限性。
提出由依存树派生的语法距离约束（SDC）掩码，用以引导注意力。
引入使用SDC计算的语法导向注意力（SDAtt），以得到语法聚焦的上下文向量。
提出将全局上下文与语法导向局部上下文结合的双上下文NMT 架构。
在大规模的 ZH-EN 和 EN-DE 任务上进行评估，并显示相较于强基线的改进。

提出的方法

通过从依存树学习掩码矩阵 M，扩展局部注意力。
使用 M[p_i] 对语法距离应用高斯分布来计算对齐分数 e_ij，产生 n-gram SDAtt 的 alpha^{s_n}_{ij}。
由 h_j 的加权得到语法导向上下文向量 c^s_i，并在词预测中使用。
将 SDAtt 融入双上下文架构，该架构还包含全局上下文向量 c^g_i，使 P(y_i|y_<i,x,T) = softmax(L_o tanh(L_w E_y[y_{i-1}] + L_d s_i + L_cg c^g_i + L_cs c^s_i))。
在 Chinese-English (ZH-EN) 和 English-German (EN-DE) 任务上使用 Nematus 进行训练和评估；对依存树使用 Stanford 解析器；词汇量上限 50k；最大长度 80 token；优化使用 ADADELTA。
与 PBSMT、GlobalAtt、LocalAtt、FlexibleAtt，以及 Chen et al. 2017 基线进行比较。

实验结果

研究问题

RQ1语法距离信息是否能将注意力基础的 NMT 的性能提升，超越线性距离约束？
RQ2语法导向注意力机制是否在翻译质量上优于标准的全局/局部/灵活注意力基线？
RQ3将语法导向的局部上下文与传统的全局上下文相结合（双上下文）是否带来额外提升？
RQ4SDAtt 在不同语类特征的语言对（ZH-EN 与 EN-DE）以及不同句长下的表现如何？

主要发现

SDAtt 相较 GlobalAtt 在 ZH-EN 上平均提高 1.32 BLEU 点。
SDAtt 在 ZH-EN 上平均比 LocalAtt 和 FlexibleAtt 分别高 0.97 和 1.04 BLEU 点。
SDAtt 相较 Chen et al. 2017 在 ZH-EN 上平均高 0.47 BLEU 点。
对于 EN-DE，SDAtt 提供的改进与 ZH-EN 相似，显示对语言对的鲁棒性。
双上下文 +SDAtt 在单一上下文变体之上带来额外增益，例如在 ZH-EN 上 +SDAtt 相比 +LocalAtt 与 +FlexibleAtt 取得显著优势。
SDAtt 在不同句长（包括较长句子）下仍保持较高的 BLEU 分数，相较基线表现更好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。