QUICK REVIEW

[論文レビュー] KD4MT: A Survey of Knowledge Distillation for Machine Translation

Ona de Gibert, Joseph Attieh|arXiv (Cornell University)|Jan 22, 2026

Natural Language Processing Techniques被引用数 0

ひとこと要約

このサーベイは知識蒸留(KD)の手法と機械翻訳への適用を105件の論文を横断して統合し、技術、ユースケース、ギャップ、およびLLMの役割を浮き彫りにします。

ABSTRACT

Knowledge Distillation (KD) as a research area has gained a lot of traction in recent years as a compression tool to address challenges related to ever-larger models in NLP. Remarkably, Machine Translation (MT) offers a much more nuanced take on this narrative: in MT, KD also functions as a general-purpose knowledge transfer mechanism that shapes supervision and translation quality as well as efficiency. This survey synthesizes KD for MT (KD4MT) across 105 papers (through October 1, 2025). We begin by introducing both MT and KD for non-experts, followed by an overview of the standard KD approaches relevant to MT applications. Subsequently, we categorize advances in the KD4MT literature based on (i) their methodological contributions and (ii) their practical applications. Our qualitative and quantitative analyses identify common trends in the field and highlight key research gaps as well as the absence of unified evaluation practice for KD methods in MT. We further provide practical guidelines for selecting a KD method in concrete settings and highlight potential risks associated with the application of KD to MT such as increased hallucination and bias amplification. Finally, we discuss the role of LLMs in re-shaping the KD4MT field. To support further research, we complement our survey with a publicly available database summarizing the main characteristics of the surveyed KD methods and a glossary of key terms.

研究の動機と目的

KDが機械翻訳でどのように使われ、モデル圧縮を超えてなぜ重要かを説明する。
KD手法を監視（ supervision ）レベルとMTの適用文脈で分類する。
KD4MTにおける共通の傾向、ギャップ、評価実践を特定する。
KD手法を選択するための実践的ガイドラインを提供し、幻覚や偏見の増幅といったリスクを論じる。
KD4MTにおける大規模言語モデルの進化する役割を議論し、手法の公開データベースを提供する。

提案手法

応答ベースのアプローチとしてのWord-Level KDとSequence-Level KDを説明する。
中間表現を転送する手法としてのFeature-Based KDを説明する。
3つのアルゴリズム次元を概説する：監督の選択、監督の拡張、監督の再定義。
多教師、Proxyタスク、LLMベースのアプローチを含む、監督のフィルタリングと多様化の技術を要約する。
露出バイアスと容量ギャップに対処するためのOn-policy学習や模倣学習に着想を得たKD派生を検討する。
さらなる研究を支援する公開KD4MTデータベースと用語集を提供する。

Figure 1 : Teacher versus student parameter counts. Dot size indicates frequency of a specific configuration, color marks the compression ratio $\frac{\mathrm{Teacher\ size}}{\mathrm{Student\ size}}$ . Roughly half of the works use a ratio of 1. Best viewed in color.

実験結果

リサーチクエスチョン

RQ1KDでMTに転送すべき知識は何で、どの粒度であるべきか？
RQ2単一の教師を超えて監督を拡張することで、MTのKDをどう改善できるか？
RQ3教師と学生の間の分布的・挙動的ミスマッチをKDはどう解決すべきか？
RQ4MTにおけるKDの実用的な適用領域とその固有の課題は何か？
RQ5LLMはKD4MTの未来をどう形作り、そのコストはどの程度か？

主な発見

多くの研究で教師と学生のサイズが同程度であり、KDをMTの純粋な圧縮とみなす見方に異議がある。
Word-KDとSeq-KDが最も一般的なKD手法であり、特徴ベースのKDが一部のケースで最先端の結果を達成している。
トークン・文・表現レベルでのフィルタリングと選択的監督がKDの有効性を向上させる。
複数の教師、Proxyタスク、LLMの指針を用いた監督の拡張がトレーニング信号を豊かにする。
RL風のOn-policyトレーニングによる監督の再定義は露出バイアスと容量ギャップを緩和するのに役立つ。
公開されたKD4MTデータベースと用語集が、再現性と今後の研究を支援する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。