QUICK REVIEW

[论文解读] Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM

Rachel Bawden, François Yvon|arXiv (Cornell University)|Mar 3, 2023

Natural Language Processing Techniques被引用 12

一句话总结

该论文在多种数据集和语言上评估 BLOOM 的机器翻译，发现 0-shot MT 存在过度生成和语言泄漏的问题，而少-shot 提示显著提升性能；存在跨语言迁移，语言上下文可能影响翻译，尽管并不总是提升分数。

ABSTRACT

The NLP community recently saw the release of a new large open-access multilingual language model, BLOOM (BigScience et al., 2022) covering 46 languages. We focus on BLOOM's multilingual ability by evaluating its machine translation performance across several datasets (WMT, Flores-101 and DiaBLa) and language pairs (high- and low-resourced). Our results show that 0-shot performance suffers from overgeneration and generating in the wrong language, but this is greatly improved in the few-shot setting, with very good results for a number of language pairs. We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.

研究动机与目标

评估 BLOOM 在不同语言对和数据集上的零-shot 与少-shot 翻译能力。
研究提示设计和冗长程度对机器翻译质量的影响。
考察跨语言迁移以及语言上下文在翻译中的作用。
将 BLOOM 与其他模型进行比较并在标准 MT 基准上建立基线。

提出的方法

使用 Language Model Evaluation Harness 在 0-shot 和少-shot 设置下评估 BLOOM。
在 WMT、Flores-101 和 DiaBLa 数据集上使用标准的 BLEU 与 COMET 指标进行评估。
测试多种 BLOOM 规模和七个提示以分析提示敏感性。
应用截断以减轻过度生成，并使用 fastText 语言识别检测语言泄漏。
将 BLOOM 与任务微调模型和 OPT 作为基线进行比较。

实验结果

研究问题

RQ1BLOOM 在不同语言对和数据集上的 0-shot 与少-shot 翻译表现如何？
RQ2提示设计对 BLOOM 的 MT 性能及跨语言提示敏感性有何影响？
RQ3BLOOM 在多大程度上表现出跨语言迁移，以及邻近或相关语言如何影响翻译质量？
RQ4语言或话语上下文是否能提升翻译质量，在何种条件下？

主要发现

0-shot BLOOM 翻译存在过度生成和翻译为错误语言的问题，少-shot 设置显著缓解。
少-shot 提示使 BLOOM 的 MT 结果在若干语言对和数据集上更接近最新水平。
可观察到迁移效应；BLOOM 在训练中未正式看到的语言上也能取得较好分数，并且通过少-shot 示例在语言对之间显示跨语言迁移。
提示选择对 0-shot 结果影响显著，某些提示甚至导致接近灾难性的 MT 性能，而 1-shot 的表现对提示选择更具鲁棒性。
语言上下文并不总是提高指标分数，但有证据表明 BLOOM 的翻译受上下文影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。