QUICK REVIEW

[论文解读] Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

Hantao Yao, Rui Zhang|arXiv (Cornell University)|Mar 23, 2023

Multimodal Machine Learning Applications被引用 17

一句话总结

KgCoOp 通过对可学习的提示进行正则化，以保持接近手工提示，在借助 CLIP 的情况下提升对未见类别的泛化，同时保持快速训练。

ABSTRACT

Prompt tuning is an effective way to adapt the pre-trained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge. However, the specific textual knowledge is the worse generalization to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. The key insight of KgCoOp is that forgetting about essential knowledge can be alleviated by reducing the discrepancy between the learnable prompt and the hand-crafted prompt. Especially, KgCoOp minimizes the discrepancy between the textual embeddings generated by learned prompts and the hand-crafted prompts. Finally, adding the KgCoOp upon the contrastive loss can make a discriminative prompt for both seen and unseen tasks. Extensive evaluation of several benchmarks demonstrates that the proposed Knowledge-guided Context Optimization is an efficient method for prompt tuning, \emph{i.e.,} achieves better performance with less training time.

研究动机与目标

激发在对 pretrained visual-language 模型进行提示微调时实现更好泛化的需求。
引入一个正则化项，使学习到的提示与通用、手工设计的提示对齐，以保留通用知识。
证明最小化提示差异在不牺牲已见类别准确率的前提下提升未见类别的表现。
在 11 个数据集和多种骨干网络上，评估 KgCoOp 在 base-to-new、少-shot 和领域泛化设定下的表现。

提出的方法

在 CoOp 的基础上加入一个知识引导的上下文优化项。
从手工设计的提示中定义通用文本知识（例如，CLIP 的 a photo of a [Class]）。
从少数样本数据生成的可学习提示中定义特定文本知识。
引入 L_kg = (1/N_c) sum_i ||w_i − w_i_clip||^2，用于最小化学习到的嵌入与通用嵌入之间的差异。
优化总损失 L = L_ce + lambda * L_kg，其中 lambda 用于平衡两项。
证明 KgCoOp 在保持与 CoOp 相同的基础类别性能和相同的训练时间的前提下，未见类别的性能更高。

实验结果

研究问题

RQ1在学习提示与通用提示之间强制接近是否能在不损害已见类别性能的前提下提升未见类别的泛化？
RQ2在多样数据集上的 base-to-new、少-shot 和领域泛化场景中，KgCoOp 的表现如何？
RQ3正则化权重 lambda 对泛化和调和平均性能有何影响？
RQ4KgCoOp 是否能在现有基于 CoOp 的方法之上应用以提升未见类别的泛化？

主要发现

KgCoOp 在 base-to-new 泛化上的综合调和均值性能（H）高于 CoOp、CoCoOp 和 ProGrad。
KgCoOp 在保持 Base-class 性能与 CoCoOp 相近的同时，在 New-class 准确率方面高于竞争的基于 CoOp 的方法。
KgCoOp 的训练时间与 CoOp 相当，且比 CoCoOp 与 ProGrad 快。
KgCoOp 提升领域泛化，在 ImageNet 派生目标上平均目标性能高于 CoCoOp。
在少-shot 设置中，KgCoOp 在所有数据集上的基线平均之上，提升未见类别的表现。
正则化参数 lambda 控制权衡；在 4-shot/16-shot 情景中，最优的 lambda（例如 8.0）能带来最佳的调和均值。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。