QUICK REVIEW

[論文レビュー] Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM

Rachel Bawden, François Yvon|arXiv (Cornell University)|Mar 3, 2023

Natural Language Processing Techniques被引用数 12

ひとこと要約

本論文は BLOOM の機械翻訳を複数のデータセットと言語で評価し、0-shot MT は過生成と語種流出に悩まされる一方、few-shot プロンプトは性能を大幅に改善することを発見した。跨言語転移が発生し、言語的文脈は翻訳に影響を与える可能性があるが、一貫してスコアを押し上げるわけではない。

ABSTRACT

The NLP community recently saw the release of a new large open-access multilingual language model, BLOOM (BigScience et al., 2022) covering 46 languages. We focus on BLOOM's multilingual ability by evaluating its machine translation performance across several datasets (WMT, Flores-101 and DiaBLa) and language pairs (high- and low-resourced). Our results show that 0-shot performance suffers from overgeneration and generating in the wrong language, but this is greatly improved in the few-shot setting, with very good results for a number of language pairs. We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.

研究の動機と目的

多様な言語ペアとデータセットにわたって BLOOM のゼロショットおよびフューショット翻訳能力を評価する。
MT 品質に対するプロンプト設計と冗長性の影響を検討する。
跨言語転送と翻訳における言語的文脈の役割を調べる。
他モデルと BLOOM を比較し、標準 MT ベンチマークでベースラインを設定する。

提案手法

Language Model Evaluation Harness を用いて BLOOM を 0-shot および few-shot 設定で評価する。
WMT、Flores-101、DiaBLa データセットで標準の BLEU および COMET 指標を用いて評価する。
複数の BLOOM サイズと7つのプロンプトをテストしてプロンプト感度を分析する。
過生成を緩和するための切り捨てを適用し、fastText 言語識別で言語流出を検出する。
タスク調整済みモデルと OPT をベースラインとして BLOOM を比較する。

実験結果

リサーチクエスチョン

RQ1異なる言語ペアとデータセットにおいて、0-shot と few-shot の翻訳で BLOOM はどのように性能を発揮するか？
RQ2プロンプト設計が BLOOM の MT 性能と言語間のプロンプト感度に与える影響は何か？
RQ3BLOOM がどの程度跨言語転送を示し、近接言語や関連言語が翻訳品質にどのように影響するか？
RQ4言語的・語用論的文脈は翻訳品質を向上させるか、どの条件下でそうなるか？

主な発見

0-shot BLOOM translations exhibit overgeneration and translations in the wrong language, which are substantially mitigated in the few-shot setting.
Few-shot prompts bring BLOOM’s MT results closer to state-of-the-art levels across several language pairs and datasets.
There are observable transfer effects; BLOOM can score well on languages not officially seen in training and shows cross-lingual transfer across language pairs via few-shot examples.
Prompt choice significantly affects 0-shot results, with some prompts yielding near-catastrophic MT performance, while 1-shot performance is more robust to prompt choice.
Linguistic context does not consistently boost metric scores, but there is evidence that BLOOM’s translations are influenced by context.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。