QUICK REVIEW

[논문 리뷰] Modeling Coverage for Neural Machine Translation

Zhaopeng Tu, Zhengdong Lu|arXiv (Cornell University)|2016. 01. 19.

Natural Language Processing Techniques참고 문헌 27인용 수 159

한 줄 요약

NMT의 주의(attention) 이력을 추적하는 커버리지 메커니즘을 도입; 언어학적 및 NN 기반 커버리지를 사용해 과소/과다 번역과 정렬을 개선하고 BLEU가 중국어–영어에서 상승.

ABSTRACT

Attention mechanism has enhanced state-of-the-art Neural Machine Translation (NMT) by jointly learning to align and translate. It tends to ignore past alignment information, however, which often leads to over-translation and under-translation. To address this problem, we propose coverage-based NMT in this paper. We maintain a coverage vector to keep track of the attention history. The coverage vector is fed to the attention model to help adjust future attention, which lets NMT system to consider more about untranslated source words. Experiments show that the proposed approach significantly improves both translation quality and alignment quality over standard attention-based NMT.

연구 동기 및 목표

Motivate and address the lack of explicit coverage in attention-based NMT leading to over- and under-translation.
Propose a coverage mechanism that maintains a coverage vector updated after each attention step.
Explore linguistic and neural network-based coverage models to integrate into the NMT attention mechanism.

제안 방법

Maintain a per-source-word coverage vector C_{i-1} that summarizes past attention for each source word.
Integrate the coverage vector into the attention model to adjust future attention via e_{i,j} = a(t_{i-1}, h_j, C_{i-1,j}).
Propose linguistic coverage with a scalar or fertility-based normalization Phi_j to compute coverage C_{i,j} as C_{i,j}=C_{i-1,j}+ (1/Phi_j) * alpha_{i,j} and explore pre-computed fertility Phi_j.
Propose neural-network-based coverage where C_{i,j} is updated through a function f(C_{i-1,j}, alpha_{i,j}, h_j, t_{i-1}) with GRU-style gating.
Train end-to-end to maximize P(y|x; theta, eta) and compare with baseline attention-based NMT (GroundHog) and Moses.
Evaluate both translation quality via BLEU and alignment quality via SAER/AER, with a focus on long sentences.

실험 결과

연구 질문

RQ1Does incorporating a coverage mechanism into attention-based NMT improve translation quality compared to standard attention-based NMT and SMT baselines?
RQ2How do linguistic coverage and neural network-based coverage variants perform in terms of translation quality and alignment accuracy?
RQ3What is the impact of coverage on long sentences and under-/over-translation phenomena?

주요 결과

System	#Params	MT05	MT06	MT08	Avg
Moses	–	31.37	30.85	23.01	28.41
GroundHog	84.3M	30.61	31.12	23.23	28.32
+ Linguistic coverage w/o fertility	+1K	31.26	32.16	24.84	29.42
+ Linguistic coverage w/ fertility	+3K	32.36	32.31	24.91	29.86
+ NN-based coverage w/o gating (d=1)	+4K	31.94	32.11	23.31	29.12
+ NN-based coverage w/ gating (d=1)	+10K	31.94	32.16	24.67	29.59
+ NN-based coverage w/ gating (d=10)	+100K	32.73	32.47	25.23	30.14

Coverage-based NMT significantly improves BLEU scores over standard attention-based NMT on MT05, MT06, and MT08 datasets.
Linguistic coverage with fertility yields notable BLEU gains and improves alignment, with fertility helping estimate covered ratios.
NN-based coverage with gating also improves BLEU, with higher dimensional coverage (d=10) providing additional gains.
Coverage reduces under-translation and over-translation in subjective evaluations, and improves translation adequacy and fluency.
Coverage helps mitigate performance drop on longer sentences, pushing attention toward untranslated words and lengthening translations appropriately.
The most complex NN-based coverage adds relatively few parameters yet delivers competitive improvements.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.