Skip to main content
QUICK REVIEW

[论文解读] Fast Domain Adaptation for Neural Machine Translation

Markus Freitag, Yaser Al-Onaizan|arXiv (Cornell University)|Dec 20, 2016
Natural Language Processing Techniques参考文献 20被引用 164
一句话总结

论文提出一种快速方法,通过在目标域数据上继续训练并与基线模型进行集成来防止对域外性能的降 degrad,快速将现有 NMT 系统适配到新领域。

ABSTRACT

Neural Machine Translation (NMT) is a new approach for automatic translation of text from one human language into another. The basic concept in NMT is to train a large Neural Network that maximizes the translation performance on a given parallel corpus. NMT is gaining popularity in the research community because it outperformed traditional SMT approaches in several translation tasks at WMT and other evaluation tasks/benchmarks at least for some language pairs. However, many of the enhancements in SMT over the years have not been incorporated into the NMT framework. In this paper, we focus on one such enhancement namely domain adaptation. We propose an approach for adapting a NMT system to a new domain. The main idea behind domain adaptation is that the availability of large out-of-domain training data and a small in-domain training data. We report significant gains with our proposed method in both automatic metrics and a human subjective evaluation metric on two language pairs. With our adaptation method, we show large improvement on the new domain while the performance of our general domain only degrades slightly. In addition, our approach is fast enough to adapt an already trained system to a new domain within few hours without the need to retrain the NMT model on the combined data which usually takes several days/weeks depending on the volume of the data.

研究动机与目标

  • Motivate the need for domain adaptation in neural machine translation (NMT).
  • Propose a fast adaptation approach that reuses a baseline out-of-domain NMT model and adapts it with in-domain data.
  • Evaluate adaptation on German→English and Chinese→English with automatic metrics and human judgments.
  • Demonstrate that ensembling baseline and continued-training models preserves general-domain quality while improving in-domain performance.

提出的方法

  • Use an attention-based encoder-decoder NMT model with bi-GRU encoder and attention-based decoder.
  • Adaptation by continuing training the baseline out-of-domain model on in-domain data (continue model).
  • Mitigate overfitting by ensembling the continue model with the baseline model at decoding time.
  • Evaluate with BLEU and TER metrics; also perform human judgments on in-domain samples.

实验结果

研究问题

  • RQ1Can a pre-trained NMT model be quickly adapted to a new domain using only in-domain data without heavily degrading out-of-domain performance?
  • RQ2Does ensembling the continued-training model with the baseline prevent overfitting and retain general-domain quality?
  • RQ3How does adaptation perform across language pairs with different domain characteristics (German→English, Chinese→English)?

主要发现

  • Adaptation with continued training on in-domain data yields large in-domain gains (up to ~9.9 BLEU points, ~12.2 TER points in some cases).
  • Ensembling the continue model with the baseline preserves out-of-domain quality while delivering in-domain gains (e.g., up to 7.2 BLEU and 10 TER in some setups).
  • Two epochs of continue training can achieve strong in-domain performance with minimal degradation on out-of-domain data; longer continuation risks overfitting.
  • Human judgments corroborate automatic metrics, showing improvements for both continue and ensemble approaches over baseline on in-domain data.
  • The method is demonstrated on German→English and Chinese→English, with corresponding adaptation dynamics documented in Tables 2 and 6.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。