[论文解读] Robust Neural Machine Translation with Doubly Adversarial Inputs
本文提出 AdvGen,通过对抗源输入和对抗目标输入来攻击并防守 NMT,在中文–英文和英文–德文任务上提升翻译质量与鲁棒性。
Neural machine translation (NMT) often suffers from the vulnerability to noisy perturbations in the input. We propose an approach to improving the robustness of NMT models, which consists of two parts: (1) attack the translation model with adversarial source examples; (2) defend the translation model with adversarial target inputs to improve its robustness against the adversarial source inputs.For the generation of adversarial inputs, we propose a gradient-based method to craft adversarial examples informed by the translation loss over the clean inputs.Experimental results on Chinese-English and English-German translation tasks demonstrate that our approach achieves significant improvements ($2.8$ and $1.6$ BLEU points) over Transformer on standard clean benchmarks as well as exhibiting higher robustness on noisy data.
研究动机与目标
- Motivate and address vulnerability of NMT to small input perturbations.
- Propose a white-box, gradient-based method to generate adversarial inputs for NMT.
- Introduce a defense mechanism using adversarial target inputs to improve robustness.
- Train an NMT model with a combined objective that includes clean, robust, and language-model guided components.
- Demonstrate improvements over Transformer on standard benchmarks and robustness on noisy data.
提出的方法
- Develop AdvGen, a gradient-based adversarial input generator guided by the translation loss.
- Attack the encoder by generating adversarial source inputs x' that maximize -log P(y|x'; θ_mt) under a perturbation constraint.
- Select candidate replacements via a top-n word set derived from a source language model Q_src.
- Defend by generating adversarial target inputs z' for the decoder using a target candidate set Q_trg and attention-informed sampling D_trg.
- Compute a robustness loss using the perturbed pair (x', z') and combine it with clean language-model based losses to form the final objective.
- Train with four loss components: L_clean, L_robust, and two L_lm terms for source and target language models (sharing embeddings with the MT model).
实验结果
研究问题
- RQ1Can a white-box gradient-based method effectively produce adversarial source inputs for NMT without excessive perturbation.
- RQ2Can adversarial target inputs during training improve robustness to perturbations in the source input.
- RQ3Does training with doubly adversarial inputs lead to improvements over standard Transformer baselines on clean data and maintain robustness on noisy data.
- RQ4What is the impact of the adversarial components and language-model guidance on translation quality and stability?
主要发现
- Significant BLEU gains over Transformer on standard benchmarks: average +2.25 BLEU on Chinese–English with up to +2.8 BLEU on NIST03.
- Ensemble results on English–German show improvements over Transformer: +1.04 BLEU over Trans.-Base, +1.61 BLEU over Trans.-Big, and +1.52 BLEU over RNMT+.
- Ours + BackTranslation further improves results by up to ~1–3 BLEU points when monolingual data is used.
- Ablation shows target-input adversarial changes contribute substantial gains; language models aid fluency and candidate pruning.
- Model with robustness training maintains better stability under artificial noise, outperforming baselines across noise fractions.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。