QUICK REVIEW

[论文解读] SoC: Semantic Orthogonal Calibration for Test-Time Prompt Tuning

Leo Fillioux, Omprakash Chakraborty|arXiv (Cornell University)|Jan 13, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

SoC 引入基于 Huber 的正则化器用于视觉语言模型的测试时提示微调，以实现更平滑、具语义感知的提示原型分离并在保持强辨别性能的同时改善相较于完整正交性方法的校准。

ABSTRACT

With the increasing adoption of vision-language models (VLMs) in critical decision-making systems such as healthcare or autonomous driving, the calibration of their uncertainty estimates becomes paramount. Yet, this dimension has been largely underexplored in the VLM test-time prompt-tuning (TPT) literature, which has predominantly focused on improving their discriminative performance. Recent state-of-the-art advocates for enforcing full orthogonality over pairs of text prompt embeddings to enhance separability, and therefore calibration. Nevertheless, as we theoretically show in this work, the inherent gradients from fully orthogonal constraints will strongly push semantically related classes away, ultimately making the model overconfident. Based on our findings, we propose Semantic Orthogonal Calibration (SoC), a Huber-based regularizer that enforces smooth prototype separation while preserving semantic proximity, thereby improving calibration compared to prior orthogonality-based approaches. Across a comprehensive empirical validation, we demonstrate that SoC consistently improves calibration performance, while also maintaining competitive discriminative capabilities.

研究动机与目标

说明在 VLM 测试时提示微调（TPT）中需要对不确定性进行标定的原因。
识别在语义相关类别下，O-TPT 的完全正交约束的局限性。
提出 SoC 作为一种基于 Huber 的正则化器，在保持语义邻近性的同时确保原型分离的平滑性。
理论分析原型相似度如何控制 softmax 的置信度与校准。
在多样数据集与主干网络上实证验证 SoC，展示在保持竞争性准确度的同时提升校准性。

提出的方法

将 SoC 作为对 TPT 损失的 Huber 基于正则化项加入，惩罚成对原型相似度并限制梯度。
令 sij 为类别原型 ti 与 tj 的余弦相似度，对这些相似度应用带有界限 delta 的 Huber 损失。
推导理论界限，显示余弦相干性 mu 如何控制 softmax 的置信度，以及 SoC 如何缓解过度的置信度膨胀。
将 SoC 的一阶梯度动力学与完整正交性（O-TPT）进行对比，以解释校准差异。
在 11 个数据集、ViT 主干上进行评估，使用标准 TPT 提示及评估指标（准确率和 ECE）。
分析对提示模板的敏感性，以及对不同主干和分布转移的鲁棒性。

Figure 1 : Motivation for SoC. With O-TPT, ambiguity inherent to the class semantics is lost due to the aggressive orthogonality constraint, leading to artificially high confidence, even when predictions are incorrect. Let us take this image as an example, whose correct class is “ annual crop land ”

实验结果

研究问题

RQ1基于 Huber 的正则化是否能在测试时提示微调中提升相较于完整正交性（O-TPT）的校准？
RQ2语义邻近性在不同正则化器下如何影响置信度与校准？
RQ3SoC 能否在不同数据集和主干网络上保持有竞争力的辨别性能同时提升校准？
RQ4在 SoC 下相对于 O-TPT，提示模板对校准的敏感性如何？
RQ5在分布转移和多步提示更新下，SoC 的鲁棒性如何？

主要发现

Model	ImgNet	DTD	Flowers	Food101	SUN397	Aircraft	Pets	Caltech	UCF101	EuroSAT	Cars	Average
Zero-Shot	73.5	52.4	76.2	88.6	67.7	29.9	93.1	95.1	73.8	55.0	76.8	71.1
TPT NeurIPS'22	75.6	55.3	76.3	89.0	70.2	31.8	93.6	95.5	74.9	51.9	77.8	72.0
C-TPT ICLR'24	75.0	55.1	76.5	88.9	70.1	30.9	94.1	95.5	75.2	54.0	77.5	72.1
O-TPT CVPR'25	73.2	54.6	76.4	88.6	68.9	30.0	93.8	95.3	74.5	53.6	76.7	71.4
SoC Ours	74.5	54.4	77.0	88.9	69.5	30.9	93.9	95.6	74.9	58.3	77.0	72.3

SoC 在 11 个数据集上持续提升校准（更低的 ECE），相比 TPT、C-TPT 与 O-TPT。
SoC 在除一个数据集外的所有数据集上相对于 O-TPT 获得最佳 ECE，在许多情形接近零-shot 的校准。
SoC 维持竞争性准确率，在多数据集与多主干上表现出提升或同等的准确性。
两步梯度实验表明，在重复更新下，SoC 的校准退化低于 O-TPT。
主干消融（ViT-L/14 与 ViT-B/16）显示 SoC 在准确率和 ECE 上均优于 O-TPT，并在零-shot 情况下取得改进。
可靠性图表显示 SoC 产生更贴近对角线的更平坦曲线，体现比 O-TPT 更好的校准性。

Figure 2 : ECE per class pair as a function of the zero-shot cosine similarity. We compute the ECE for the wrong predictions across each class pair (i.e., the model predicted class $i$ when the label was class $j$ ) and analyze the relation with the zero-shot similarity between both classes on EuroS

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。