QUICK REVIEW

[論文レビュー] Prompting Large Language Model for Machine Translation: A Case Study

Biao Zhang, Barry Haddow|arXiv (Cornell University)|Jan 17, 2023

Natural Language Processing Techniques被引用数 68

ひとこと要約

GLM-130B を用いた機械翻訳の prompting 戦略の系統的研究で、プロンプトテンプレート、デモンストレーション例、単言語データの利用、及び設定横断の転移学習を検討する。

ABSTRACT

Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study on prompting strategies for translation, examining various factors for prompt template and demonstration example selection. We further explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning in prompting. Extensive experiments with GLM-130B (Zeng et al., 2022) as the testbed show that 1) the number and the quality of prompt examples matter, where using suboptimal examples degenerates translation; 2) several features of prompt examples, such as semantic similarity, show significant Spearman correlation with their prompting performance; yet, none of the correlations are strong enough; 3) using pseudo parallel prompt examples constructed from monolingual data via zero-shot prompting could improve translation; and 4) improved performance is achievable by transferring knowledge from prompt examples selected in other settings. We finally provide an analysis on the model outputs and discuss several problems that prompting still suffers from.

研究の動機と目的

プロンプトテンプレートが言語ペア間の MT 品質に与える影響を評価する。
デモンストレーション例が prompting の性能に与える影響と、効果的な選択方法を調査する。
prompting における単言語データの活用と、疑似並行デモンストレーションの可能性を探る。
転移学習を検討する：クロスリンガル、クロスドメイン、文対文 prompting の効果を検討する。

提案手法

翻訳 prompting の固定 LLM として GLM-130B（INT4-quantized）を用いる。
複数のテンプレートとデモンストレーション戦略でゼロショットおよび少数ショット prompting を評価する。
デモンストレーションの特徴と prompting パフォーマンスとの相関を研究する Ablation セットを構築・分析する。
単言語データを用いた実験、バック-/フォワード翻訳を用いて疑似並行プロンプト例を作成する。
prompting のクロス設定転移を研究する：クロスリンガル、クロスドメイン、文レベルから文書レベルへの転移を検討する。
出力の共通の prompting 関連問題と潜在的 mitigations を分析する。

実験結果

リサーチクエスチョン

RQ1どの MT prompting テンプレートが最良の性能をもたらし、テンプレート言語が結果にどう影響するか？
RQ2デモンストレーション例は prompting の性能にどのような影響を与え、デモンストレーションのどの特徴がより良い MT プロンプトと相関するか？
RQ3単言語データを prompting で効果的に使用できるか、疑似並行プロンプト例はどう比較されるか？
RQ4プロンプトデモンストレーションは言語、ドメイン、出力粒度（文 vs 文書）間でどの程度転移するか？
RQ5MT の prompting で現れる実務的な問題（コピー、固有表現の誤訳、幻覚、プロンプトトラップなど）は何で、どのように緩和できるか？

主な発見

テンプレートの選択はゼロショット MT に大きく影響を与える。英語の単純テンプレートは、GLM-130B を用いた英語への翻訳で一般的に最良の性能を発揮する。
いくつかのデモンストレーションの特徴（長さ、LMスコア、意味的類似性）は prompting パフォーマンスと相関するが、相関は弱く、一貫して予測的ではない。
プロンプト中の単言語データは一般的に MT パフォーマンスを悪化させる。バック-/フォワード翻訳による疑似並行プロンプトは prompting を改善し、バック翻訳の方がより堅牢である。
prompting には一定の転移性が見られるが、設定間の転移利得は控えめで、ある設定のデモンストレーションが別の設定でゼロショットを reliably 上回るとは限らない。
prompting はまだコピー、固有表現の誤訳、幻覚、プロンプトトラップなどの問題に直面しており、英語を介したピボットは非英語方向でいくらかの利点を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。