QUICK REVIEW

[論文レビュー] How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding

Wei Chen, Guoyang Ju|arXiv (Cornell University)|Feb 23, 2026

Topic Modeling被引用数 0

ひとこと要約

論文は Log-Scale Focal Uncertainty (LSFU) と UCPOF を提案し、プロンプト最適化のための最初のトークン不確実性をキャリブレーションして few-shot の精度を向上させ、RAG における不要な取得を削減します。

ABSTRACT

With the widespread adoption of large language models (LLMs) in natural language processing, prompt engineering and retrieval-augmented generation (RAG) have become mainstream to enhance LLMs' performance on complex tasks. However, LLMs generate outputs autoregressively, leading to inevitable output uncertainty. Since model performance is highly sensitive to prompt design, precise uncertainty measurement is crucial for reliable prompt optimization. For multi-class multiple-choice (understanding) tasks, conventional uncertainty measures (e.g., entropy) based on output probabilities treat all classes equally and ignore class prior differences in pretraining corpora. This failure to distinguish spurious confidence (from priors) from true certainty (from contextual understanding) results in poor confidence calibration. To address this, we propose Log-Scale Focal Uncertainty (LSFU), a first-token-based metric inspired by focal loss. LSFU incorporates label prior probabilities as a risk-modulation factor to suppress noise from high-frequency classes and emphasize risk for low-frequency long-tail classes, with a dynamic weighting mechanism unifying the measurement scale. Based on LSFU, we further propose the uncertainty-calibrated prompt optimization framework (UCPOF), which leverages the first token of model outputs to select high-quality exemplars and dynamically optimize prompts. Comprehensive evaluations show UCPOF improves average accuracy by 6.03% over few-shot baselines, surpasses always-on full RAG by 5.75% in overall average accuracy, and reduces the average retrieval trigger rate by 50.66%. By adaptively triggering RAG only for high-uncertainty samples, our framework significantly lowers computational costs while maintaining state-of-the-art performance.

研究の動機と目的

分類タスクのインコンテキスト学習でプロンプト設計を導く信頼できる不確実性シグナルの必要性を動機づける。
LSFU によりクラス事前分布を考慮した最初のトークン不確実性を提案し、キャリブレーションを改善する。
LSFUを用いた静的プロンプト選択（Gold Shot）と、不確実性認識型動的 prompting フレームワーク（UCPOF）を開発する。
UCPOF が精度を向上させつつ取得トリガを減少させ、費用対効果の高い適応 prompting を実現することを示す。

提案手法

LSFU を導入。これは first token の top-K エントロピーとラベル事前分布因子（1 - P_prior(y))^2 および対数変換を組み合わせたもの。
LSFU を用いて Gold Shot 選択を行い高品質な静的プロンプトを得る。
UCPOF を二段階のフレームワークとして定義する：オフライン準備（静的プロンプト + ゲーティング閾値）とオンライン推論（静的プロンプト → ゲート → 必要に応じて reflective prompting）。
リスクの高いサンプルに対して semantically similar references を取得して reflective prompting を trigger するゲーティング閾値 T を実装する。
P_UCPOF(y|x) が S(x) < τ のとき ICL を使用し、そうでない場合は retrieved context を用いた RAG を適用する正式な意思決定規則を提供する。
オフラインのベクトル知識ベースを、reflective prompts の取得を支援するすべてのサンプルをエンコードして構築する。

実験結果

リサーチクエスチョン

RQ1最初のトークンの不確実性は prompting と exemplar 選択のための全体的なタスク理解を信頼性高く示せるか。
RQ2LSFU によりラベル priors を不確実性に組み込むと entropy ベースの測度よりキャリブレーションは改善されるか。
RQ3UCPOF は常時全RAGと比較して取得使用を削減しつつ精度を維持または向上させるか。
RQ4動的取得トリガが分類タスクの推論コストと性能に与える影響は何か。

主な発見

LSFU は高頻度クラスを低ウェイト化し長尾リスクを強調することで不確実性をキャリブレーションし、プロンプト最適化のシグナルを改善する。
LSFU を用いた Gold Shot サンプル選択はランダム選択や類似性ベース選択よりも安定的で焦点を絞った静的プロンプトを生む。
UCPOF は few-shot ベースラインに対して平均精度を 6.03% 向上させ、常時全RAGを上回り、全体精度を 5.75% 向上させる。
UCPOF は取得トリガー率を 50.66% 減少させ、不確実性が高いサンプルに対してのみ適応的な取得を行い計算を削減する。
このフレームワークはオフラインの静的 prompting とオンラインの動的訂正を組み合わせ、最先端の性能を維持しつつ推論コストを低減する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。