QUICK REVIEW

[论文解读] Unsupervised Prompt Learning for Vision-Language Models

Tony Jun Huang, Jack O. Chu|arXiv (Cornell University)|Apr 7, 2022

Multimodal Machine Learning Applications被引用 54

一句话总结

UPL 在无监督条件下通过在未标注的目标数据上生成伪标签并对提示进行自训练来学习 CLIP 的提示表示，从而在没有目标注释的情况下提高迁移性能。

ABSTRACT

Contrastive vision-language models like CLIP have shown great progress in transfer learning. In the inference stage, the proper text description, also known as prompt, needs to be carefully designed to correctly classify the given images. In order to avoid laborious prompt engineering, recent works such as CoOp, CLIP-Adapter and Tip-Adapter propose to adapt vision-language models for downstream image recognition tasks on a small set of labeled data. Though promising improvements are achieved, requiring labeled data from the target datasets may restrict the scalability. In this paper, we explore a different scenario, in which the labels of the target datasets are unprovided, and we present an unsupervised prompt learning (UPL) approach to avoid prompt engineering while simultaneously improving transfer performance of CLIP-like vision-language models. As far as we know, UPL is the first work to introduce unsupervised learning into prompt learning. Experimentally, our UPL outperforms original CLIP with prompt engineering on ImageNet as well as other 10 datasets. An enhanced version of UPL is even competitive with the 8-shot CoOp and the 8-shot TIP-Adapter on most datasets. Code and models are available at https://github.com/tonyhuang2022/UPL.

研究动机与目标

在没有标记的目标数据的情况下激励提示学习，以提升 CLIP 风格模型。
通过以无监督的方式学习一个连续的提示表示来消除人工提示设计。
分析伪标签和提示优化如何影响跨多样数据集的迁移性能。

提出的方法

使用预训练的视觉-语言模型（如 CLIP）为未标注的目标数据生成伪标签。
按每个类别选择前-K 个置信样本以形成伪标签集合，并缓解类别不平衡问题。
定义一个跨所有类别共享的可学习提示表示，并通过对伪标签样本进行交叉熵优化。
推理阶段用学习到的提示表示替代手工设计的提示。
可选地使用伪标签集成（跨 CLIP 模型）和提示表示集成（多个学习到的提示）以提高鲁棒性。

实验结果

研究问题

RQ1在没有目标域标签的情况下，无监督提示学习是否能够提升视觉-语言模型的迁移性能？
RQ2伪标签策略（top-K）和集成方法如何影响跨数据集的迁移精度？
RQ3单一共享的可学习提示表示是否足以覆盖多类，还是应采用多个提示更有利？
RQ4UPL 对伪标签中的噪声和伪标签固有的类别不平衡问题有多鲁棒？

主要发现

在 ImageNet 和另外 10 个数据集上，UPL 的表现优于带提示工程的原始 CLIP。
增强版本 UPL* 使用多个 CLIP 模型进行伪标签标注，在许多数据集上与 8-shot CoOp 和 8-shot Tip-Adapter 的性能相当。
Top-K 伪标签避免了阈值引起的类别不平衡以及置信度与标签质量之间的弱相关性，从而提高稳定性。
提示表示集成通过利用学习到的提示中的类别特定偏差获得额外的迁移提升。
由于优化在所有类别之间共享一个通用的提示表示，UPL 对嘈杂的伪标签表现出鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。