[论文解读] Modeling Coverage for Neural Machine Translation
引入一种覆盖机制用于NMT以跟踪注意力历史;使用语言学与基于NN的覆盖来减少漏译/重复翻译并改善对齐,在中英数据上获得BLEU提升。
Attention mechanism has enhanced state-of-the-art Neural Machine Translation (NMT) by jointly learning to align and translate. It tends to ignore past alignment information, however, which often leads to over-translation and under-translation. To address this problem, we propose coverage-based NMT in this paper. We maintain a coverage vector to keep track of the attention history. The coverage vector is fed to the attention model to help adjust future attention, which lets NMT system to consider more about untranslated source words. Experiments show that the proposed approach significantly improves both translation quality and alignment quality over standard attention-based NMT.
研究动机与目标
- Motivate and address the lack of explicit coverage in attention-based NMT leading to over- and under-translation.
- Propose a coverage mechanism that maintains a coverage vector updated after each attention step.
- Explore linguistic and neural network-based coverage models to integrate into the NMT attention mechanism.
提出的方法
- Maintain a per-source-word coverage vector C_{i-1} that summarizes past attention for each source word.
- Integrate the coverage vector into the attention model to adjust future attention via e_{i,j} = a(t_{i-1}, h_j, C_{i-1,j}).
- Propose linguistic coverage with a scalar or fertility-based normalization Phi_j to compute coverage C_{i,j} as C_{i,j}=C_{i-1,j}+ (1/Phi_j) * alpha_{i,j} and explore pre-computed fertility Phi_j.
- Propose neural-network-based coverage where C_{i,j} is updated through a function f(C_{i-1,j}, alpha_{i,j}, h_j, t_{i-1}) with GRU-style gating.
- Train end-to-end to maximize P(y|x; theta, eta) and compare with baseline attention-based NMT (GroundHog) and Moses.
- Evaluate both translation quality via BLEU and alignment quality via SAER/AER, with a focus on long sentences.
实验结果
研究问题
- RQ1Does incorporating a coverage mechanism into attention-based NMT improve translation quality compared to standard attention-based NMT and SMT baselines?
- RQ2How do linguistic coverage and neural network-based coverage variants perform in terms of translation quality and alignment accuracy?
- RQ3What is the impact of coverage on long sentences and under-/over-translation phenomena?
主要发现
| 系统 | 参数数 | MT05 | MT06 | MT08 | 平均 |
|---|---|---|---|---|---|
| Moses | – | 31.37 | 30.85 | 23.01 | 28.41 |
| GroundHog | 84.3M | 30.61 | 31.12 | 23.23 | 28.32 |
| + Linguistic coverage w/o fertility | +1K | 31.26 | 32.16 | 24.84 | 29.42 |
| + Linguistic coverage w/ fertility | +3K | 32.36 | 32.31 | 24.91 | 29.86 |
| + NN-based coverage w/o gating (d=1) | +4K | 31.94 | 32.11 | 23.31 | 29.12 |
| + NN-based coverage w/ gating (d=1) | +10K | 31.94 | 32.16 | 24.67 | 29.59 |
| + NN-based coverage w/ gating (d=10) | +100K | 32.73 | 32.47 | 25.23 | 30.14 |
- Coverage-based NMT significantly improves BLEU scores over standard attention-based NMT on MT05, MT06, and MT08 datasets.
- Linguistic coverage with fertility yields notable BLEU gains and improves alignment, with fertility helping estimate covered ratios.
- NN-based coverage with gating also improves BLEU, with higher dimensional coverage (d=10) providing additional gains.
- Coverage reduces under-translation and over-translation in subjective evaluations, and improves translation adequacy and fluency.
- Coverage helps mitigate performance drop on longer sentences, pushing attention toward untranslated words and lengthening translations appropriately.
- The most complex NN-based coverage adds relatively few parameters yet delivers competitive improvements.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。