QUICK REVIEW

[論文レビュー] S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning

Yabin Wang, Zhiwu Huang|arXiv (Cornell University)|Jul 26, 2022

Domain Adaptation and Few-Shot Learning被引用数 60

ひとこと要約

S-Prompts は、事前学習済みトランスフォーマーを用いて独立したドメイン特化プロンプトを導入し、 exemplar-free ドメインインクリメンタル学習に取り組み、強いドメイン分離と忘却の低減を達成する。実装は二つ：画像ベースのプロンプト（ViT）と言語-画像プロンプト（CLIP）。

ABSTRACT

State-of-the-art deep neural networks are still struggling to address the catastrophic forgetting problem in continual learning. In this paper, we propose one simple paradigm (named as S-Prompting) and two concrete approaches to highly reduce the forgetting degree in one of the most typical continual learning scenarios, i.e., domain increment learning (DIL). The key idea of the paradigm is to learn prompts independently across domains with pre-trained transformers, avoiding the use of exemplars that commonly appear in conventional methods. This results in a win-win game where the prompting can achieve the best for each domain. The independent prompting across domains only requests one single cross-entropy loss for training and one simple K-NN operation as a domain identifier for inference. The learning paradigm derives an image prompt learning approach and a novel language-image prompt learning approach. Owning an excellent scalability (0.03% parameter increase per domain), the best of our approaches achieves a remarkable relative improvement (an average of about 30%) over the best of the state-of-the-art exemplar-free methods for three standard DIL tasks, and even surpasses the best of them relatively by about 6% in average when they use exemplars. Source code is available at \url{https://github.com/iamwangyabin/S-Prompts}.

研究の動機と目的

エグザンプルを保存せずに、ドメインインクリメンタル学習（DIL）における破局的忘却に対処する。
ドメインごとに独立してプロンプトを学習し、ドメイン固有の性能を最大化する単純なパラダイム（S-Prompts）を提案する。
S-iPrompts を ViT 上、S-liPrompts を CLIP 上で、拡張可能なプロンプトプールとともに二つの実装を実証する。

提案手法

事前学習済みのトランスフォーマーを固定し、ドメインごとに独立してドメイン特異プロンプトを学習して拡大するプロンプトプールへ追加する。
ドメインプロンプトの訓練には単純なクロスエントロピー損失を用い、推論時にはK-Means/K-NN によるドメイン識別子を用いる。
S-iPrompts の場合、画像トークンに独立した画像プロンプトを付与し、ドメインごとに FC 分類器を訓練する。
S-liPrompts の場合、結合画像と言語プロンプトを付与する。ドメインごとの言語プロンプトを用いた CLIP 風のテキストエンコーダと、CLIP に基づくドメイン固有の分類器を用いる。

実験結果

リサーチクエスチョン

RQ1エグザンプルなし DIL は、共有プロンプトに代わりドメインごとに独立してプロンプトを作成することで競争力のあるまたはそれ以上の性能を達成できるか？
RQ2画像のみのプロンプト戦略と言語-画像プロンプト戦略は、複数ドメインにまたがる精度・忘却・スケーラビリティの点でどのように比較されるか？
RQ3単純なドメイン識別子（K-Means/K-NN）は推論時の効果的なドメインルーティングに十分か？
RQ4ドメイン数が増加した際の S-Prompts のメモリと計算コストはどの程度になるか？
RQ5S-Prompts は未見のドメインやドメイン外データにどれだけ汎化できるか？

主な発見

Method	Buffer size	Average Acc (↑)	Forgetting (↑)
LRCIL	100/class	76.39*	-4.39*
iCaRL	79.76*	-8.73*
LUCIR	82.53*	-5.34*
LRCIL	50/class	74.01*	-8.62*
iCaRL	73.98*	-14.50*
LUCIR	80.77*	-7.85*
DyTox	86.21	-1.55
EWC	0/class	50.59	-42.62
LwF	60.94	-13.53
DyTox	51.27	-45.85
L2P	61.28	-9.23
S-iPrompts (ours)	74.51	-1.30
S-liPrompts (ours)	88.65	-0.69
Upper-bound (S-iPrompts)	-	85.50	-
Upper-bound (S-liPrompts)	-	91.91	-

S-Prompts は3つの標準的な DIL ベンチマークでエグザンプルなしベースラインを大幅に上回る（平均約30% の相対的な前方精度の改善）。
S-Prompts は競合するエグザンプルなし手法と比較して忘却を大幅に低減（平均忘却改善約13–41ポイント）。
S-liPrompts は CLIP ベースのプロンプトを用いた S-liPrompts は DomainNet でエグザンプルベース手法をも凌ぎ、未見ドメインへの強い汎化を示す。
CLIP の言語-画像 prompting スキーム（S-liPrompts）は、ドメインあたり約0.03% のパラメータ増加でスケーラブルなドメイン拡張を実現。
推論時にドメイン識別が不完全でも、S-Prompts は競争力のあるまたはそれ以上の性能を維持する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。