QUICK REVIEW

[論文レビュー] Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

Zorik Gekhman, Gal Yona|arXiv (Cornell University)|May 9, 2024

Big Data and Digital Economy被引用数 11

ひとこと要約

この論文は、新しい知識でLLMをファインチューニングすることがその知識の学習を遅くし、学習済み後には既存知識に関する幻覚を増加させることを示している；早期停止や未知の例のフィルタリングはこのリスクを緩和できる。

ABSTRACT

When large language models are aligned via supervised fine-tuning, they may encounter new factual information that was not acquired through pre-training. It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge. In this work, we study the impact of such exposure to new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. To this end, we design a controlled setup, focused on closed-book QA, where we vary the proportion of the fine-tuning examples that introduce new knowledge. We demonstrate that large language models struggle to acquire new factual knowledge through fine-tuning, as fine-tuning examples that introduce new knowledge are learned significantly slower than those consistent with the model's knowledge. However, we also find that as the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate. Taken together, our results highlight the risk in introducing new factual knowledge through fine-tuning, and support the view that large language models mostly acquire factual knowledge through pre-training, whereas fine-tuning teaches them to use it more efficiently.

研究の動機と目的

事実情報の新規知識でファインチューニングすることが、事前学習を超えたLLMの挙動にどのような影響を与えるかを理解する動機づけ。
ファインチューニング中のモデル知識を分類する4カテゴリの知識分類法SliCKを提案する。
ファインチューニングにおけるUnknown (new) knowledgeの割合を変化させた場合が、性能と幻覚にどのように影響するかを実証的に検討する。
堅牢な知識活用のための学習ダイナミクスと、異なるKnownカテゴリの相対的価値を調査する。

提案手法

知識を分類するために、モデルのサンプルから連続指標としてP_Correctを定義する。
評価のために、EntityQuestionsから12のリレーションと7つのOODリレーションを用いて、UnknownとKnownの例の割合が異なるD variantsを作成する。
Dに基づいてPaLM 2-M baseをファインチューニングし、インコンテキストプロンプトと温度サンプリングを用いてexact-match性能を測定する。
開発セットで早期停止を用いて過学習効果を検討し、完全収束と比較する。
各(q,a)をSliCKカテゴリUnknown, HighlyKnown, MaybeKnown, WeaklyKnownで注釈付けする。greedyとサンプリング出力に基づく。

Figure 1: Train and development accuracies as a function of the fine-tuning duration, when fine-tuning on $50\%$ $\mathtt{Known}$ and $50\%$ $\mathtt{Unknown}$ examples. $\mathtt{Unknown}$ examples are fitted substantially slower than $\mathtt{Known}$ . The best development performance is obtained w

実験結果

リサーチクエスチョン

RQ1ファインチューニング時に新しい知識への露出は、モデルが既存の知識について幻覚を起こす傾向を高めるか？
RQ2ファインチューニングにおける異なるKnownカテゴリ（HighlyKnown, MaybeKnown, WeaklyKnown）は、既存知識の利用とテスト性能にどのように影響するか？
RQ3Unknownの例をフィルタリングすることや早期停止は、ファインチューニングによって導入される幻覚のリスクを緩和できるか？
RQ4観察された効果は、ファインチューニング済みセットを超えるout-of-distributionリレーションにも一般化するか？

主な発見

ファインチューニングでUnknownの割合が高いほど、設定を問わず性能が低下する。
Unknownの例はKnownよりはるかに遅く学習され、学習が長くなるほど悪影響が拡大する。
Unknownの例が学習されるにつれて、モデルが既存知識について幻覚を起こす傾向は線形に増加する。
Unknown例をフィルタリングすることや早期停止は、性能を犠牲にせず幻覚リスクを低減できる。
MaybeKnownのファインチューニング例は、知識の利用と既存知識の保持のバランスを取り、全体的な改善に最も寄与する。
効果はout-of-distributionリレーションにも転移し、Unknown-to-hallucinationの同様のダイナミクスを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。