QUICK REVIEW

[論文レビュー] Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Minki Kang, Seanie Lee|arXiv (Cornell University)|May 28, 2023

Topic Modeling被引用数 21

ひとこと要約

KARD は大規模 LLM からの推論を外部知識で強化された小規模 LMs に蒸留し、神経 reranker を用いて合理性に関連する passages を検索、知識集約型 QA ベンチマークで高い性能を達成します。

ABSTRACT

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the 250M T5 models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.

研究の動機と目的

プライバシーと計算資源の制約のため、知識集約型タスクにおける小規模 LMs の必要性を動機づける。
外部知識 KB パッセージを補助として用いながら、LLM の推論を小規模 LMs に蒸留するフレームワークを提案する。
推論時に合理性生成に関連するパッセージを取得するためのニューラル再ランキング器を導入する。
KARD が MedQA-USMLE、StrategyQA、OpenBookQA においてベースラインより性能を向上させることを示す。

提案手法

トレーニングデータ用にチェーンオブソート prompting を用いてLLMsに合理性を生成させる。
質問を条件として、合理性と回答の両方を生成するよう小規模 LM をファインチューニングする。
推論のための合理性をクエリとして、取得した KB パッセージ（LKB）を用いて小規模 LM のトレーニングを増強する。
推論時に合理性生成により関連のあるよう retrieved passages を再排序するニューラル再ランキング器を導入する。
KLダイバージェンス目的で、再ランキング器が合理性に関してリトリーバーのランキングを模倣するよう訓練する。
推論時にはパッセージを取得し、再ランキングし、合理性を生成し、最終回答を出す。

実験結果

リサーチクエスチョン

RQ1知識 augment を伴う蒸留は、知識集約型タスクにおいて LLM の推論を小規模 LM に効果的に移行できるか？
RQ2外部知識と再ランキング器を追加することは、標準的な推論蒸留を超える小規模 LM の性能向上につながるか？
RQ3医療およびマルチモーダル推論ベンチマークにおいて、KARD はベースライン（few-shot、ファインチューニング、標準的な推論蒸留）とどう比較されるか？

主な発見

KARD はモデルサイズを問わず MedQA-USMLE、StrategyQA、および OpenBookQA で一貫してベースラインを上回る。
知識の補強により小規模 LM の記憶化要件が抑制され、パラメータ数が少なくても性能向上が可能になる。
ニューラル再ランキング器は合理性生成のためのパッセージの関連性を高め、BM25検索よりも下流の回答を改善する。
KARD は小型モデル（例：250M パラメータ）で強い向上を示し、時には大規模なファインチューニングモデルを上回る。
DAPT は KARD に比べて限られた効果しかなく、推論蒸留における知識強化の固有の価値を浮き彫りにしている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。