QUICK REVIEW

[論文レビュー] INTRYGUE: Induction-Aware Entropy Gating for Reliable RAG Uncertainty Estimation

Alexandra Bazarova, Andrei Volodichev|arXiv (Cornell University)|Mar 23, 2026

Topic Modeling被引用数 0

ひとこと要約

機械的に根拠を置いた手法（INTRYGUE）は induction-head 活動を用いて予測エントロピーをゲートし、RAGでの不確実性推定を改善する。複数のベンチマークとモデルでベースラインを上回る。

ABSTRACT

While retrieval-augmented generation (RAG) significantly improves the factual reliability of LLMs, it does not eliminate hallucinations, so robust uncertainty quantification (UQ) remains essential. In this paper, we reveal that standard entropy-based UQ methods often fail in RAG settings due to a mechanistic paradox. An internal "tug-of-war" inherent to context utilization appears: while induction heads promote grounded responses by copying the correct answer, they collaterally trigger the previously established "entropy neurons". This interaction inflates predictive entropy, causing the model to signal false uncertainty on accurate outputs. To address this, we propose INTRYGUE (Induction-Aware Entropy Gating for Uncertainty Estimation), a mechanistically grounded method that gates predictive entropy based on the activation patterns of induction heads. Evaluated across four RAG benchmarks and six open-source LLMs (4B to 13B parameters), INTRYGUE consistently matches or outperforms a wide range of UQ baselines. Our findings demonstrate that hallucination detection in RAG benefits from combining predictive uncertainty with interpretable, internal signals of context utilization.

研究の動機と目的

entropy-based uncertainty estimates が RAG において induction head–entropy neuron の相互作用により失敗することを実証する。
induction heads が正しい grounding および entropy の膨張に能動的に寄与することを示す。
INTRYGUE を導入し induction-head 活動で entropy をゲートし RAG における幻覚検出を改善する。
INTRYGUE を四つの RAG ベンチマークと六つのオープンソース LLMs で検証する。
mechanistic interpretability における UQ の限界と意味を論じる。

提案手法

InductionScore（ induction-head 活性）を特定し SinkRate 測定を用いて影響を定量化する。
Entropy neurons とそれらが予測エントロピーを膨張させる役割を特徴づける。
induction heads および entropy neurons を標的とする平均消去実験で因果関係を確立する。
INTRYGUE を induction 活動と予測的不確実性の積として定義する：INTRYGUE(P,R)=f(SinkRates) * g(entropy over predictions)。
INTRYGUE を情報ベース・サンプリングベース・機械的解釈ベースなど多様なベースラインと比較する。
集約の選択肢（min-max vs mean）への頑健性を評価し、計算効iciency を分析する。

Figure 1: The mechanistic tug-of-war in RAG uncertainty quantification. ➀ During generation, Induction Heads exhibit intense attention activations, locking onto specific patterns in the retrieved context to promote relevant tokens into the output. This mechanism sharpens the logit distribution towar

実験結果

リサーチクエスチョン

RQ1Entropy-based uncertainty estimation はベースラインと比べて RAG でどう機能するか？
RQ2induction heads と entropy neurons の機械的相互作用は uncertainty signals にどう影響するか？
RQ3induction-head 活動で entropy を reliably gating して RAG の幻覚検出を改善できるか？
RQ4INTRYGUE はタスク・モデル・出力長を問わず既存ベースラインを一貫して上回るか？

主な発見

Entropy-based UQ は induction-head–entropy neuron のデュアル性により、文脈-grounded な正しい出力のエントロピーを膨張させ、RAG で性能が低下する。
induction heads は正解生成に因果的に寄与しており、ターゲットを持つアブレーションで損失とエントロピーが増加することが示された。
Entropy neurons は induction heads によって因果的に駆動され、entropy neurons の消去はエントロピーに対する影響を減少させる。
INTRYGUE は SinkRate の集計を介して entropy を induction-head 活性でゲートし、四つの RAG ベンチマークと六つのモデルで広範なベースラインを上回るまたは同等である。
最適な集約バリアントは出力長に依存する。長文出力には INTRYGUE_min-max が優れ、短い出力には INTRYGUE_mean が適している。
INTRYGUE は LN-Entropy に匹敵する実行時間を維持し、サンプリングベースのベースラインよりは速い。

Figure 2: Maximum Entropy score distributions for hallucinated (pink) and grounded (blue) samples heavily overlap, demonstrating that Maximum Entropy fails to reliably separate hallucinated from grounded LLM responses.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。