QUICK REVIEW

[論文レビュー] Exploring Human-Like Translation Strategy with Large Language Models

Zhiwei He, Tian Liang|arXiv (Cornell University)|May 6, 2023

Natural Language Processing Techniques被引用数 16

ひとこと要約

MAPS は、ソーステキストからキーワード、トピック、デモを抽出して人間の翻訳を模倣するように大規模言語モデルを促し、それらを統合して複数の翻訳を生成し、品質推定を用いて最良の出力を選択します。自動評価と人間評価の両方で、MAPS が翻訳品質を向上させ、幻覚的誤訳や曖昧さといったエラーを減らすことを示します。

ABSTRACT

Large language models (LLMs) have demonstrated impressive capabilities in general scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses, human-level intelligence. Among their numerous skills, the translation abilities of LLMs have received considerable attention. Compared to typical machine translation that focuses solely on source-to-target mapping, LLM-based translation can potentially mimic the human translation process which might take preparatory steps to ensure high-quality translation. This work explores this possibility by proposing the MAPS framework, which stands for Multi-Aspect Prompting and Selection. Specifically, we enable LLMs first to analyze the given source sentence and induce three aspects of translation-related knowledge: keywords, topics, and relevant demonstrations to guide the final translation process. Moreover, we employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge. Both automatic (3 LLMs x 11 directions x 2 automatic metrics) and human evaluation (preference study and MQM) demonstrate the effectiveness of MAPS. Further analysis shows that by mimicking the human translation process, MAPS reduces various translation errors such as hallucination, ambiguity, mistranslation, awkward style, untranslated text, and omission. Source code is available at https://github.com/zwhe99/MAPS-mt.

研究の動機と目的

LLM における人間に類似した翻訳戦略の探求を促進する。
翻訳前にソースから翻訳に関連する知識を抽出する MAPS フレームワークを提案する。
知識マイニング、統合、品質ベースの選択が翻訳品質を改善する方法を示す。
複数の言語方向にわたって自動指標と人間評価で MAPS を評価する。

提案手法

MAPS の導入: 三つのステップでの多面的プロンプティングと選択: 知識マイニング、知識統合、知識選択。
知識マイニングは、ソース文に対してキーワード、トピック、関連デモを生成するようLLMを促す。
知識統合は抽出された知識を用いて複数の翻訳候補を生成する。
知識選択は参照なしの品質推定（QE）を用いて最良の候補を絞り込み選択する。訓練済みの QE モデルと LLM ベースの QE の両方が有効である。
COMET と BLEURT を自動指標として、さらに人間の MQM および嗜好研究を用いて、11 の翻訳方向と 3つの LLM に対して MAPS を評価する。

実験結果

リサーチクエスチョン

RQ1LLM はソーステキストからキーワード、トピック、デモを抽出して人間の翻訳準備ステップを模倣できるか？
RQ2この3種類の知識を取り入れることで、ベースラインやリランキング法と比較して翻訳品質が向上するか？
RQ3品質推定ベースの選択が最終翻訳品質とエラー種別（幻覚、曖昧さ、誤訳など）にどのような影響を与えるか？
RQ4異なる知識選択手法（LLM-SCQ、Comet-QE、Comet）による MAPS の性能への影響は？
RQ5三-in-one プロンプト（すべての知識タイプを統合）は、言語ペアを跨いで利益をもたらすか？

主な発見

MAPS は 11 言語方向と 3 LLM において自動指標によって一貫して Baseline と Rerank を上回る。
Comet-QE を用いた MAPS は、いくつかの方向で最良の WMT22 提出と同等かそれを上回ることが多く、LLMs が人間の準備戦略を模倣して翻訳品質を向上させ得ることを示唆する。
3つの知識タイプ（キーワード、トピック、デモ）をすべて使用すると最良の結果が得られ、アブレーション実験では各タイプが有意な寄与を示す。
MQM と人間の嗜好研究は、MAPS の翻訳が一般に好まれ、誤訳、違和感のある文体、未翻訳テキスト、省略誤りを減らすことを示している。
MAPS はベースラインやリランキング手法と比べてトークンレベルの幻覚を減らし、曖昧さ解消タスクに役立つ。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。