QUICK REVIEW

[论文解读] CLIPood: Generalizing CLIP to Out-of-Distributions

Yang Shu, Xingzhuo Guo|arXiv (Cornell University)|Feb 2, 2023

Natural Language Processing Techniques被引用 11

一句话总结

CLIPood 通过 Margin Metric Softmax 和 Beta Moving Average 对 CLIP 进行微调，以在域偏移和开放类场景下提高对 OOD 的泛化。

ABSTRACT

Out-of-distribution (OOD) generalization, where the model needs to handle distribution shifts from training, is a major challenge of machine learning. Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. This paper aims at generalizing CLIP to out-of-distribution test data on downstream tasks. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on the unseen test data. To exploit the semantic relations between classes from the text modality, CLIPood introduces a new training objective, margin metric softmax (MMS), with class adaptive margins for fine-tuning. To incorporate both pre-trained zero-shot model and fine-tuned task-adaptive model, CLIPood leverages a new optimization strategy, Beta moving average (BMA), to maintain a temporal ensemble weighted by Beta distribution. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.

研究动机与目标

研究如何将 CLIP 模型泛化到包含域偏移和开放类别的下游任务的 OOD 数据。
设计一种微调方法，在提升 OOD 泛化的同时保持跨模态的图像-文本对齐。
利用文本模态的语义关系来引导微调。
提出一种优化策略，保留预训练的零-shot 知识与任务特定的适应性。
在多样化的 OOD 基准测试上评估 CLIPood，以展示对现有方法的一致性改进。

提出的方法

通过使用来自任务提示的类别文本嵌入来预测图像-文本相似性来微调 CLIP。
引入 Margin Metric Softmax（MMS），其基于类别间文本嵌入距离添加自适应边距。
冻结文本编码器以保留广义语义关系，只微调图像编码器。
在微调期间维护模型检查点的 Beta Moving Average（BMA）以集成预训练和任务特定的知识。
使用 Beta(β,β) 分布计算时间集成权重，并在运行时更新移动平均模型。
对跨模态预测使用带温度 τ 的余弦相似性，并遵循 CLIP 的训练协议。

Figure 1 : We adapt pre-trained CLIP models on downstream tasks with training data, while maintaining OOD generalization ability to overcome both domain shift and open class .

实验结果

研究问题

RQ1在适应具有域偏移和开放类别的下游任务时，如何对 CLIP 进行微调以保持 OOD 泛化？
RQ2通过 MMS 利用文本空间的语义关系是否能改善跨模态对齐和下游 OOD 性能？
RQ3像 BMA 这样的时间集合能否在预训练的零样本知识与任务特定微调之间取得平衡，从而提升 OOD 稳健性？

主要发现

CLIPood 在域偏移基准（DomainBed 的变体）以及在具有分布偏移的 ImageNet 变体上优于现有的泛化技术。
CLIPood 在 11 个下游数据集上实现了更高的开放类泛化，相较于零-shot CLIP 和先前的微调方法。
在同时存在域偏移和开放类别的场景中，CLIPood 在 OfficeHome 和 DomainNet 上始终超越零-shot 和 CoOp 基线。
消融实验确认 MMS 和 BMA 共同贡献于更好的 OOD 泛化，其中 MMS 保留语义关系，BMA 平衡知识源。
与 EMA 相比，BMA 更好地保留了预训练和微调的知识，带来更优的开放类和域偏移性能。

Figure 2 : Overview of the proposed CLIPood method. CLIPood compares image embeddings with class text embeddings. Margin Metric Softmax is introduced to exploit semantic relationships between classes. Moreover, a Beta Moving Average model is maintained for prediction, which incorporates both the pre

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。