QUICK REVIEW

[논문 리뷰] SoC: Semantic Orthogonal Calibration for Test-Time Prompt Tuning

Leo Fillioux, Omprakash Chakraborty|arXiv (Cornell University)|2026. 01. 13.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

SoC는 비전-언어 모델에서 테스트 시 프롬프트 튜닝을 위한 Huber 기반 정규화기를 도입하여 전체 직교성 방법보다 더 매끄럽고 의미적으로 인지되는 프롬프트 프로토타입 분리와 보정(calibration)을 향상시키면서도 강한 판별 성능을 유지한다.

ABSTRACT

With the increasing adoption of vision-language models (VLMs) in critical decision-making systems such as healthcare or autonomous driving, the calibration of their uncertainty estimates becomes paramount. Yet, this dimension has been largely underexplored in the VLM test-time prompt-tuning (TPT) literature, which has predominantly focused on improving their discriminative performance. Recent state-of-the-art advocates for enforcing full orthogonality over pairs of text prompt embeddings to enhance separability, and therefore calibration. Nevertheless, as we theoretically show in this work, the inherent gradients from fully orthogonal constraints will strongly push semantically related classes away, ultimately making the model overconfident. Based on our findings, we propose Semantic Orthogonal Calibration (SoC), a Huber-based regularizer that enforces smooth prototype separation while preserving semantic proximity, thereby improving calibration compared to prior orthogonality-based approaches. Across a comprehensive empirical validation, we demonstrate that SoC consistently improves calibration performance, while also maintaining competitive discriminative capabilities.

연구 동기 및 목표

VLM의 테스트 시 프롬프트 튜닝(TPT)에서 보정된 불확실성의 필요성을 동기화한다.
의미적으로 관련된 클래스에 대한 전체 직교성 제약(O-TPT)의 한계를 식별한다.
의미적 인접성을 보존하면서 매끄러운 프로토타입 분리를 보장하는 Huber 기반 정규화기로서 SoC를 제안한다.
프로토타입 유사도가 신뢰도와 보정을 어떻게 제어하는지 이론적으로 분석한다.
다양한 벤치마크와 백본에서 SoC를 경험적으로 검증하여 경쟁력 있는 정확도와 함께 보정이 향상됨을 보여준다.

제안 방법

SoC를 TPT 손실에 추가된 Huber 기반 정규화기로 형태화하여 한정된 그래디언트로 쌍별 프로토타입 유사성을 페널티한다.
클래스 프로토타입 ti와 tj 사이의 코사인 유사도 sij를 정의하고 이 유사도에 대해 마진 delta를 가지는 Huber 손실을 적용한다.
코사인 일관성 mu가 소프트맥스 신뢰도를 어떻게 제어하는지, SoC가 과도한 신뢰도 팽창을 어떻게 완화하는지 이론적 경계를 도출한다.
SoC의 일阶 그래디언트 역학을 전면 직교성(O-TPT)과 비교하여 보정 차이를 설명한다.
ViT 백본을 사용한 11개 데이터셋에서 표준 TPT 프롬프트와 평가 지표(정확도 및 ECE)를 사용하여 평가한다.
프롬프트 템플릿에 대한 민감도와 서로 다른 백본 및 분포 변화에 대한 강인성을 분석한다.

Figure 1 : Motivation for SoC. With O-TPT, ambiguity inherent to the class semantics is lost due to the aggressive orthogonality constraint, leading to artificially high confidence, even when predictions are incorrect. Let us take this image as an example, whose correct class is “ annual crop land ”

실험 결과

연구 질문

RQ1Huber 기반 정규화기가 테스트 시간 프롬프트 튜닝에서 전체 직교성보다 보정을 향상시키는가?
RQ2의미적 근접성이 다른 규제 아래에서 신뢰도와 보정에 어떤 영향을 미치는가?
RQ3다양한 데이터셋과 백본에서 보정을 개선하면서 SoC가 경쟁력 있는 판별 성능을 유지할 수 있는가?
RQ4SoC 하에서 프롬프트 템플릿에 대한 보정이 O-TPT에 비해 얼마나 민감한가?
RQ5분포 이동 및 다단계 프롬프트 업데이트에서도 SoC가 강건한가?

주요 결과

모델	이미지넷	DTD	Flowers	Food101	SUN397	항공기	반려동물	Caltech	UCF101	EuroSAT	Cars	평균
제로샷	73.5	52.4	76.2	88.6	67.7	29.9	93.1	95.1	73.8	55.0	76.8	71.1
TPT NeurIPS'22	75.6	55.3	76.3	89.0	70.2	31.8	93.6	95.5	74.9	51.9	77.8	72.0
C-TPT ICLR'24	75.0	55.1	76.5	88.9	70.1	30.9	94.1	95.5	75.2	54.0	77.5	72.1
O-TPT CVPR'25	73.2	54.6	76.4	88.6	68.9	30.0	93.8	95.3	74.5	53.6	76.7	71.4
SoC 제안	74.5	54.4	77.0	88.9	69.5	30.9	93.9	95.6	74.9	58.3	77.0	72.3

SoC는 TPT, C-TPT, O-TPT와 비교하여 11개 데이터셋 전반에서 일관되게 보정(더 낮은 ECE)을 향상한다.
SoC는 O-TPT에 비해 대부분의 데이터셋에서 최상의 ECE를 달성하고, 많은 경우 제로샷 보정에 근접한다.
SoC는 다양한 데이터셋과 백본에서 이득을 얻거나 동등한 정확도를 유지하며 경쟁력 있다.
두 단계의 그래디언트 실험에서 SoC가 반복 업데이트하더라도 O-TPT보다 보정 손실이 적다.
백본 제거 실험(ViT-L/14 및 ViT-B/16)은 SoC가 정확도와 ECE 모두에서 O-TPT를 능가하며 제로샷 개선에도 기여함을 보여준다.
신뢰도 다이어그램은 SoC가 대각선에 더 가까운 더 평편한 곡선을 만들어 O-TPT보다 보정이 더 우수함을 시사한다.

Figure 2 : ECE per class pair as a function of the zero-shot cosine similarity. We compute the ECE for the wrong predictions across each class pair (i.e., the model predicted class $i$ when the label was class $j$ ) and analyze the relation with the zero-shot similarity between both classes on EuroS

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.