QUICK REVIEW

[論文レビュー] Reflective Translation: Improving Low-Resource Machine Translation via Structured Self-Reflection

Nicholas Cheng|arXiv (Cornell University)|Jan 27, 2026

Natural Language Processing Techniques被引用数 0

ひとこと要約

論文は Reflective Translation を提案する。これは LLM が翻訳の構造化された自己批評と改訂を行い、ファインチューニングなしで低資源言語の英語–isiZuluおよび英語–isiXhosa MT を改善するための prompting フレームワークである。再現性のための反省強化データセットを公開し、プロンプト間で一貫した二次パスの利得を示す。

ABSTRACT

Low-resource languages such as isiZulu and isiXhosa face persistent challenges in machine translation due to limited parallel data and linguistic resources. Recent advances in large language models suggest that self-reflection, prompting a model to critique and revise its own outputs, can improve reasoning quality and factual consistency. Building on this idea, this paper introduces Reflective Translation, a prompt-based framework in which a model generates an initial translation, produces a structured self-critique, and then uses this reflection to generate a refined translation. The approach is evaluated on English-isiZulu and English-isiXhosa translation using OPUS-100 and NTREX-African, across multiple prompting strategies and confidence thresholds. Results show consistent improvements in both BLEU and COMET scores between first- and second-pass translations, with average gains of up to +0.22 BLEU and +0.18 COMET. Statistical significance testing using paired nonparametric tests confirms that these improvements are robust. The proposed method is model-agnostic, requires no fine-tuning, and introduces a reflection-augmented dataset that can support future supervised or analysis-driven work. These findings demonstrate that structured self-reflection is a practical and effective mechanism for improving translation quality in low-resource settings.

研究の動機と目的

限られた並列データしかない低資源言語の MT 改善を動機づける。
推論時の自己反省がファインチューニングなしで翻訳の信頼性を高めるかを調査する。
isiZulu および isiXhosa の構造化反省フレームワークを作成し、公開 MT データセットで評価する。
ソースドラフト-批評-修正のタプルを反映させた再現性のあるデータセットを公開する。

提案手法

LLM で初期翻訳を生成する。
エラー、修正、重要内容を特定する構造化された反省を作成する。
RAKE ベースのトークンで顕著な内容をマスクし、意味的修正を強制する。
批評に導かれた二次パスの翻訳を作成する。
OPUS-100 および NTREX-African で BLEU と COMET で翻訳を評価する。
Baseline、Chain-of-Thought、Few-shot prompting の戦略を比較する。

実験結果

リサーチクエスチョン

RQ1推論時における構造化自己反省は、ファインチューニングなしで低資源言語の翻訳の信頼性を改善できるか。
RQ2英語–isiZulu および英語–isiXhosa で、二次パスの翻訳は prompting 戦略全体で一時パスを上回るか。
RQ3RAKE による抜粋のマスキングは、コピーを減らし意味的修正を促進する役割を果たすか。

主な発見

Metric	N	Median Gain	p-value	Effect Size (r)
BLEU	324	+0.0788	1.45e-44	0.95
COMET	457	+0.1753	1.10e-65	0.96

二次パスの翻訳は、 prompting 戦略を問わず一貫して一時パスを上回る。
COMET の利得は通常 BLEU の利得より大きく安定しており、意味的適合性の向上を示す。
信頼度閾値を用いた場合、 refined samples での平均改善はカバー率とトレードオフする。
統計検定は有意な改善を示す：BLEU 中央値利益 +0.0788（p=1.45e-44, r=0.95）；COMET 中央値利益 +0.1753（p=1.10e-65, r=0.96）。
反省を伴うFew-shot prompting が最も安定した利益を戦略間で生み出した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。