QUICK REVIEW

[논문 리뷰] Trust in One Round: Confidence Estimation for Large Language Models via Structural Signals

Pengyue Yang, Jiawen Wen|arXiv (Cornell University)|2026. 02. 01.

Topic Modeling인용 수 0

한 줄 요약

이 논문은 숨겨진 상태의 궤적 구조를 기반으로 한 LLM 출력에 대한 단일 패스, 모델 비의존적 신뢰도 추정기인 Structural Confidence를 소개하고, FEVER, SciFact, WikiBio, TruthfulQA에서 평가한다.

ABSTRACT

Large language models (LLMs) are increasingly deployed in domains where errors carry high social, scientific, or safety costs. Yet standard confidence estimators, such as token likelihood, semantic similarity and multi-sample consistency, remain brittle under distribution shift, domain-specialised text, and compute limits. In this work, we present Structural Confidence, a single-pass, model-agnostic framework that enhances output correctness prediction based on multi-scale structural signals derived from a model's final-layer hidden-state trajectory. By combining spectral, local-variation, and global shape descriptors, our method captures internal stability patterns that are missed by probabilities and sentence embeddings. We conduct extensive, cross-domain evaluation across four heterogeneous benchmarks-FEVER (fact verification), SciFact (scientific claims), WikiBio-hallucination (biographical consistency), and TruthfulQA (truthfulness-oriented QA). Our Structural Confidence framework demonstrates strong performance compared with established baselines in terms of AUROC and AUPR. More importantly, unlike sampling-based consistency methods which require multiple stochastic generations and an auxiliary model, our approach uses a single deterministic forward pass, offering a practical basis for efficient, robust post-hoc confidence estimation in socially impactful, resource-constrained LLM applications.

연구 동기 및 목표

distribution shift 및 자원 제약 하에서 LLM의 강건한 사후 신뢰도 추정 촉진.
숨겨진 상태 궤적 구조(스펙트럴 안정성, 지역 변동성, 형태 일관성)를 기반으로 한 새로운 신뢰 모달리티 개발.
로짓, 기울기, 다중 샘플에 대한 접근 없이도 모델에 구애받지 않는 단일 패스 추정기 제공.
확률 기반, 임베딩 기반, 샘플링 기반 베이스라인과 비교한 교차 도메인 효과성 및 효율성 입증

제안 방법

Structural Confidence를 최종 계층 숨겨진 상태의 궤적 구조 신호로 정의.
context–answer 쌍으로 입력된 고정된 길이의 특징 벡터를 생성하기 위해 frozen encoder(bert-base-uncased)를 사용해 proxy 숨겨진 상태 궤적을 추출.
세 가지 계열의 구조 서술자를 계산: spectral stability(주파수 영역 및 그래프-라플라시안 스펙트럼), local variation(단거리 불안정성 메트릭), shape coherence(전역 궤적 분산).
서술자를 연결하여 70-차원의 구조 특성 벡터를 통합적으로 형성; Struct+Sent 변형에서는 문장 임베딩과도 융합 가능.
구조적 특징(및 선택적으로 의미적 특징)에 대해 이진 로지스틱 목표를 갖는 경량 그래디언트 부스팅 트리 추정기(LightGBM)를 학습.
_det_ 단일 패스, 모델 비의존적 배치에서 deterministic GPT-4o 출력과 고정 proxy 인코더를 사용한 평가를 수행한다.

Figure 1. Overall Structural Confidence pipeline. An LLM produces a single deterministic answer; an encoder maps the (context, answer) to a hidden-state trajectory from which multi-scale structural descriptors are extracted and scored by a lightweight confidence model.

실험 결과

연구 질문

RQ1숨겨진 상태 궤적의 구조적 안정성 신호가 단일 패스 제약 하에서도 경쟁력 있는 신뢰도 추정에 도달할 수 있는가?
RQ2도메인 이동 및 혼합 도메인 학습 하에서 구조적 신호가 확률 기반, 임베딩 기반, 샘플링 기반 베이스라인과 비교하여 어떤 성능을 보이는가?
RQ3강력한 신뢰도 추정을 위해 어떤 설계 선택(신호 계열, 세분성, 의미적 보강)이 필수적인가?

주요 결과

Structural Confidence는 FEVER, SciFact, WikiBio 전반에서 확률 기반 및 의미 기반 베이스라인과 비교해 강한 구분 능력(AUROC 및 AUPR)을 달성한다.
구조적 신호는 도메인 이동 시 비례적으로 악화되며, 임베딩 기반 방법이 더 심하게 실패하는 SciFact에서도 의미 있는 성능을 유지한다.
Struct+Sent 구성은 Praktically 효율성 측면에서 SelfCheckGPT와 같은 단일 패스 베이스라인과 종종 일치하거나 능가하며 지연 시간 및 FLOPs가 크게 낮다.
proxy 인코더 접근(Bert 기반)은 견고하고 모델-비의존적 궤적 신호를 제공하며 out-of-domain TruthfulQA에 잘 전달된다.
해당 방법은 결정론적이며 단 한 번의 순전파만 필요하고 샘플링 기반 일관성 접근법보다 계산 비용이 낮다.

Figure 2. Cross-domain AUROC for Structure-feature and Semantic-feature trained on mix_train and evaluated on four domains.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.