QUICK REVIEW

[论文解读] Data Diversification: A Simple Strategy For Neural Machine Translation

Xuan-Phi Nguyen, Shafiq Joty|arXiv (Cornell University)|Nov 5, 2019

Natural Language Processing Techniques参考文献 38被引用 44

一句话总结

本文提出 Data Diversification，一种简单的训练数据增强方法，使用多个前向和后向 NMT 模型来生成合成数据，在不使用额外的单语数据的情况下提升多项 WMT/IWSLT 任务的 BLEU 分数。

ABSTRACT

We introduce Data Diversification: a simple but effective strategy to boost neural machine translation (NMT) performance. It diversifies the training data by using the predictions of multiple forward and backward models and then merging them with the original dataset on which the final NMT model is trained. Our method is applicable to all NMT models. It does not require extra monolingual data like back-translation, nor does it add more computations and parameters like ensembles of models. Our method achieves state-of-the-art BLEU scores of 30.7 and 43.7 in the WMT'14 English-German and English-French translation tasks, respectively. It also substantially improves on 8 other translation tasks: 4 IWSLT tasks (English-German and English-French) and 4 low-resource translation tasks (English-Nepali and English-Sinhala). We demonstrate that our method is more effective than knowledge distillation and dual learning, it exhibits strong correlation with ensembles of models, and it trades perplexity off for better BLEU score. We have released our source code at https://github.com/nxphi47/data_diversification

研究动机与目标

提出一种非侵入式的数据增强策略以提升 NMT 性能。
开发一个多样化框架，从前向和后向模型生成合成数据。
在高资源和低资源语言对上评估该方法，并与相关方法进行比较。

提出的方法

在并行数据上训练多个前向和后向 NMT 模型。
通过前向和后向模型对 S 和 T 进行翻译来生成合成翻译。
在多轮和多样化因子（k, N）下，将来自两个方向的合成数据与原始数据集进行增扩。
在扩增数据集上训练最终的 S→T 模型，而不增加模型参数。
分析与集成、困惑度与 BLEU 的相关性，以及初始化和前向翻译的影响。

实验结果

研究问题

RQ1数据 Diversification 能否在不使用额外单语数据或架构改动的情况下提高 MT 质量？
RQ2Diversification 如何与模型集成和困惑度对 BLEU 的关系相关？
RQ3Diversification 参数（k, N）对不同任务的性能有何影响？
RQ4前向翻译在数据多样化中的收益是否等同于后向翻译？
RQ5在有额外单语数据时，该方法是否能对 BT 产生的 BLEU 提供额外收益？

主要发现

方法	WMT’14 En-De BLEU	WMT’14 En-Fr BLEU
Baseline Transformer	28.4	41.8
Our Data Diversification with Scale Transformer	30.7	43.7

在 WMT’14 En-De (30.7) 和 En-Fr (43.7) 上使用 Scale Transformer 实现了 SOTA BLEU，超过以往的非侵入式方法。
在 4 个 IWSLT 任务和 4 个低资源任务上实现了 1.0–2.0 BLEU 的提升，在某些设置中常常超越 back-translation 基线。
优于知识蒸馏和多智能体对学习；与集合强相关，但不需要增加推理成本。
数据多样化以换取更高的 BLEU 为代价略微增加困惑度，说明在验证集上仍具有更好的泛化能力。
前向多样化通常比后向多样化带来更强的增益，双向多样化在测试的变体中提供最佳结果。
超参数研究表明，增加 k 能带来收益直至饱和点；而增加 N 相对于成本的边际收益递减。
与 back-translation 互补；在有额外单语数据的情况下，数据多样化在 BT 之外还能带来额外的 BLEU 增益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。