QUICK REVIEW

[论文解读] No Language Left Behind: Scaling Human-Centered Machine Translation

Nllb Team, Marta R. Costa‐jussà|arXiv (Cornell University)|Jul 11, 2022

Natural Language Processing Techniques被引用 360

一句话总结

该论文通过使用稀疏门控专家模型、新颖的数据挖掘，以及全面的人类与安全性评估，在200种语言上训练一个大型、以人为本的机器翻译系统，相比先前的最先进水平实现了44%的BLEU提升。

ABSTRACT

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system. Finally, we open source all contributions described in this work, accessible at https://github.com/facebookresearch/fairseq/tree/nllb.

研究动机与目标

激发对低资源语言翻译的需求并记录其社会影响。
开发数据集和模型，缩小低资源语言与高资源语言之间的性能差距。
提出基于 Sparsely Gated Mixture of Experts 的可扩展条件计算模型。
缓解过拟合并在数千个翻译任务上进行训练。
使用人类基准和覆盖 Flores-200 的毒性基准来评估翻译质量和安全性。

提出的方法

提出使用 Sparsely Gated Mixture of Experts（MoE）的条件计算模型。
在为低资源语言量身定制的新颖技术所挖掘的数据上进行训练。
引入用于应对数千个任务的过拟合的架构和训练改进。
使用人工翻译的 Flores-200 基准评估超过 40,000 个翻译方向。
将人工评估与覆盖所有 Flores-200 语言的新颖毒性基准结合起来。
将所有贡献开源，供社区复用。

实验结果

研究问题

RQ1一个通用翻译系统能否扩展到200种语言，同时保持高质量与安全性？
RQ2哪些数据挖掘策略能最好地改善MT对低资源语言的覆盖？
RQ3条件计算和 MoE 架构在降低数千个翻译任务中过拟合方面有多有效？
RQ4不同语言中，人类评价和毒性基准与自动度量之间的相关性如何？

主要发现

相较于先前的最先进水平，取得了 44% 的 BLEU 提升。
使用人工翻译的 Flores-200 基准评估超过 40,000 个翻译方向。
通过覆盖所有 Flores-200 语言的毒性基准评估翻译安全性。
通过架构和训练改进，证明在数千个任务上的有效处理能力。
开源所有数据、模型和方法，以实现复现和更广泛的采用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。