QUICK REVIEW

[논문 리뷰] Similarity-as-Evidence: Calibrating Overconfident VLMs for Interpretable and Label-Efficient Medical Active Learning

Zhuofan Xie, Zishan Lin|arXiv (Cornell University)|2026. 02. 21.

COVID-19 diagnosis using AI인용 수 0

한 줄 요약

SaE는 VLM 텍스트-이미지 유사성을 Dirichlet 증거로 재구성하여 불확실성을 보정하고, 의학 영상에서 해석 가능하고 라벨 효율적인 능동 학습을 가능하게 한다; 20% 레이블 예산에서 매크로 정확도 SOTA(82.57%)를 달성하고, 10개 데이터셋 중 BTMRI에서 강력한 보정(NLL 0.425)을 보여준다.

ABSTRACT

Active Learning (AL) reduces annotation costs in medical imaging by selecting only the most informative samples for labeling, but suffers from cold-start when labeled data are scarce. Vision-Language Models (VLMs) address the cold-start problem via zero-shot predictions, yet their temperature-scaled softmax outputs treat text-image similarities as deterministic scores while ignoring inherent uncertainty, leading to overconfidence. This overconfidence misleads sample selection, wasting annotation budgets on uninformative cases. To overcome these limitations, the Similarity-as-Evidence (SaE) framework calibrates text-image similarities by introducing a Similarity Evidence Head (SEH), which reinterprets the similarity vector as evidence and parameterizes a Dirichlet distribution over labels. In contrast to a standard softmax that enforces confident predictions even under weak signals, the Dirichlet formulation explicitly quantifies lack of evidence (vacuity) and conflicting evidence (dissonance), thereby mitigating overconfidence caused by rigid softmax normalization. Building on this, SaE employs a dual-factor acquisition strategy: high-vacuity samples (e.g., rare diseases) are prioritized in early rounds to ensure coverage, while high-dissonance samples (e.g., ambiguous diagnoses) are prioritized later to refine boundaries, providing clinically interpretable selection rationales. Experiments on ten public medical imaging datasets with a 20% label budget show that SaE attains state-of-the-art macro-averaged accuracy of 82.57%. On the representative BTMRI dataset, SaE also achieves superior calibration, with a negative log-likelihood (NLL) of 0.425.

연구 동기 및 목표

VLM 주도 의료 활성 학습에서의 콜드 스타트 및 과신뢰 문제를 다룬다.
증거적 추론을 통해 보정되고 해석 가능한 불확실성 신호를 제공한다.
표본 선택을 위한 Vacuity와 Dissonance를 활용한 이중 요인 획득 전략을 개발한다.
PubMed 보강 프롬프트를 활용해 의료 의미 공간을 풍부하게 한다.

제안 방법

유사도 벡터를 양의 강도 lambda를 갖는 Dirichlet 증거로 매핑하는 Similarity Evidence Head(SEH)를 도입한다.
PubMed 보강 프롬프트를 사용해 VLM 유사도 계산을 위한 의미적으로 풍부한 클래스 프로토타입을 생성한다.
분류 성능과 증거 보정(Eq. 3)을 균형 있게 최적화하는 이중 목표 손실로 SEH를 학습한다.
유사도 기반 증거를 alpha_k(x)=lambda * p_k + 1(Eq. 4)를 사용해 Dirichlet 매개변수 alpha_k로 변환한다.
증거를 Vacuity와 Dissonance로 분해해 획득을 유도한다(Eqs. 5–6).
early 라운드를 높 Vac, 이후 Dis를 우선하는 선형 스케줄(Eq. 8)이 포함된 이중 요인 활성 학습 점수(Eq. 7)를 적용한다.

실험 결과

연구 질문

RQ1얼마나 고정된 VLM에서의 유사도 기반 증거를 Dirichlet 분포로 보정해 의학 활동 학습의 불확실성을 반영할 수 있는가?
RQ2 Vacuity와 Dissonance가 샘플 선택에 대해 임상적으로 의미 있고 해석 가능한 신호를 제공하는가?
RQ3이중 요인 획득 전략이 의학 영상에서 기존 AL 베이스라인 대비 라벨 효율성을 향상시키는가?
RQ4PubMed 보강 프롬프팅이 AL에서 의학 개념에 대한 VLM 의미 정렬을 향상시키는가?

주요 결과

SaE는 10개 데이터셋에서 라벨링 예산 20%에서 매크로 평균 정확도 82.57%를 달성하며 베이스라인보다 우수합니다.
SaE는 20% 예산에서 NLL 0.425 및 ECE 0.021로 보정 우수성을 보이며 불확실성의 잘 보정된 신호를 나타냅니다.
Ablation은 SEH가 성능에 결정적이며 이중 요인 점수와 VLM 유사도가 상당한 이득을 제공합니다.
SaE는 초기 라운드에서 빠른 수렴을 보여 샘플 효율성을 높이고 Cold-start 문제를 완화합니다.
실험은 다양한 기관과 데이터셋에서 Random, PCB, MedCoOp 기반 방법 및 BiomedCoOp 대비 일관된 개선을 보여줍니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.