QUICK REVIEW

[論文レビュー] CLIPood: Generalizing CLIP to Out-of-Distributions

Yang Shu, Xingzhuo Guo|arXiv (Cornell University)|Feb 2, 2023

Natural Language Processing Techniques被引用数 11

ひとこと要約

CLIPood は Margin Metric Softmax と Beta Moving Average を用いて CLIP を微調整し、ドメインシフトとオープンクラスの両方の状況でOOD一般化を向上させる。

ABSTRACT

Out-of-distribution (OOD) generalization, where the model needs to handle distribution shifts from training, is a major challenge of machine learning. Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. This paper aims at generalizing CLIP to out-of-distribution test data on downstream tasks. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on the unseen test data. To exploit the semantic relations between classes from the text modality, CLIPood introduces a new training objective, margin metric softmax (MMS), with class adaptive margins for fine-tuning. To incorporate both pre-trained zero-shot model and fine-tuned task-adaptive model, CLIPood leverages a new optimization strategy, Beta moving average (BMA), to maintain a temporal ensemble weighted by Beta distribution. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.

研究の動機と目的

ドメインシフトやオープンクラスを含むOODデータへ下流タスクへ一般化させる方法をCLIPモデルで調査する。
OOD一般化を高めつつ、画像とテキストのクロスメディアル整合を保持する微調整手法を設計する。
テキストモダリティからの意味関係を活用して微調整を導く。
事前学習済みのゼロショット知識とタスク特有の適応の両方を保持する最適化戦略を提案する。
多様なOODベンチマークで CLIPood を評価し、既存手法に対する一貫した改善を示す。

提案手法

タスクプロンプトから生成されたクラステキスト埋め込みを用いて画像-テキストの類似性を予測することで CLIP を微調整する。
クラス間のテキスト埋め込み距離に基づく適応マージンを追加した Margin Metric Softmax (MMS) を導入する。
広範な意味関係を保持するためにテキストエンコーダを凍結し、画像エンコーダのみを微調整する。
微調整中にモデルチェックポイントの Beta Moving Average (BMA) を維持して事前学習とタスク固有の知識をアンサンブルする。
Beta(β,β) 分布を用いて時系列アンサンブルの重みを計算し、その場で移動平均モデルを更新する。
クロスモーダル予測には温度τを用いたコサイン類似度を使用し、CLIPのトレーニングプロトコルに従う。

Figure 1 : We adapt pre-trained CLIP models on downstream tasks with training data, while maintaining OOD generalization ability to overcome both domain shift and open class .

実験結果

リサーチクエスチョン

RQ1ドメインシフトやオープンクラスを伴う下流タスクへ適応する際に、CLIPをどのように微調整すればOOD一般化を維持できるか？
RQ2MMSを介してテキスト空間の意味関係を活用することは、クロスモーダル整合性と下流のOOD性能を改善するか？
RQ3BMA のような時系列アンサンブルは、事前学習済みのゼロショット知識とタスク固有の微調整のバランスを取り、OODの頑健性を高めることができるか？

主な発見

CLIPood はドメインシフトベンチマーク（DomainBed の変種）および分布シフトを伴う ImageNet の変種で、既存の一般化手法を上回る。
CLIPood は 11 の下流データセットにおいて、ゼロショット CLIP および従来の微調整法より開放クラス一般化が高い。
ドメインシフトとオープンクラスの両方の状況では、OfficeHome と DomainNet 全体で CLIPood がゼロショットおよび CoOp のベースラインを一貫して上回る。
アブレーションにより MMS と BMA が共同してより良い OOD 一般化に寄与することが確認され、MMS は意味関係を保持し、BMA は知識源のバランスを取る。
EMA と比較して、BMA は事前学習済みおよび微調整済みの知識の両方をより良く保持し、オープンクラスおよびドメインシフトの性能が優れる。

Figure 2 : Overview of the proposed CLIPood method. CLIPood compares image embeddings with class text embeddings. Margin Metric Softmax is introduced to exploit semantic relationships between classes. Moreover, a Beta Moving Average model is maintained for prediction, which incorporates both the pre

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。