QUICK REVIEW

[論文レビュー] Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

Rui Gong, Martin Danelljan|arXiv (Cornell University)|Jul 5, 2023

Domain Adaptation and Few-Shot Learning被引用数 8

ひとこと要約

論文は拡散モデルで事前学習した表現がセマンティックセグメンテーションにおけるドメイン横断一般化を卓越して示し、プロンプトベースの戦略（シーンプロンプトとカテゴリプロンプト、プロンプトランダム化）とテスト時プロンプト微調整を導入してDGとTTDA性能をさらに向上させる。

ABSTRACT

While originally designed for image generation, diffusion models have recently shown to provide excellent pretrained feature representations for semantic segmentation. Intrigued by this result, we set out to explore how well diffusion-pretrained representations generalize to new domains, a crucial ability for any representation. We find that diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation, outperforming both supervised and self-supervised backbone networks. Motivated by this, we investigate how to utilize the model's unique ability of taking an input prompt, in order to further enhance its cross-domain performance. We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head. Moreover, we propose a simple but highly effective approach for test-time domain adaptation, based on learning a scene prompt on the target domain in an unsupervised manner. Extensive experiments conducted on four synthetic-to-real and clear-to-adverse weather benchmarks demonstrate the effectiveness of our approaches. Without resorting to any complex techniques, such as image translation, augmentation, or rare-class sampling, we set a new state-of-the-art on all benchmarks. Our implementation will be publicly available at \url{https://github.com/ETHRuiGong/PTDiffSeg}.

研究の動機と目的

拡散 pretrained バックボーンが未知ドメインのセマンティックセグメンテーションへどのように一般化するかを評価する。
プロンプト条件付けがドメイン不変特徴とドメイン変動手掛かりを分離できるかを調査する。
ドメイン一般化を改善するためのシーンプロンプトとカテゴリプロンプトおよびプロンプトランダム化を提案する。
教師なしターゲットデータを用いたテスト時ドメイン適応のためのプロンプト微調整ベースのアプローチを開発する。

提案手法

拡散 pretrained バックボーン（Stable Diffusion）を凍結し、セマンティック投影ヘッドを訓練する。
特徴を分離する条件づけ入力としてカテゴリプロンプト（クラストークン）とシーンプロンプト（ドメイン/スタイルの手掛かり）を導入する。
複数のシーンプロンプト間で予測の一貫性を強制するKL発散ベースの損失を用いてプロンプトランダム化を適用する。
複合損失：セマンティックセグメンテーション損失と一貫性損失を組み合わせて複数プロンプトで訓練する。
TTDAのために、ターゲットドメインへ適応するためにシーンプロンプトのみを擬似ラベルに基づく目的でファインチューニングする。

実験結果

リサーチクエスチョン

RQ1拡散 pretraining は out-of-domain のセマンティックセグメンテーションにおいて supervised や self-supervised バックボーンと比較してどのようか。
RQ2プロンプト条件付け（カテゴリプロンプトとシーンプロンプト）はドメイン一般化を改善できるか。
RQ3プロンプトランダム化はドメイン不変表現の分離と頑健性をさらに高めるか。
RQ4テスト時プロンプト微調整によりラベルなしターゲットドメインへの効率的な適応が可能か。

主な発見

拡散 pretrained バックボーンは ImageNet 監視、自己 supervisd、CLIP バックボーンと比較して GTA→Cityscapes などのドメイン一般化において優れた成績を達成する。
カテゴリプロンプトとシーンプロンプトはモデルがドメイン不変の意味論とドメイン変動のスタイルを分離するのを助け、一般化を改善する。
プロンプトランダム化は異なるシーンプロンプト間で一貫した予測を生み出し、合成→実写および鮮明→不利なベンチマークでベースラインを上回る。
テスト時のシーンプロンプトのプロンプト微調整はパラメータ効率のTTDA利得を提供し、いくつかのTTDAベースラインを上回る。
プロンプトを用いた DG 手法は Cityscapes→ACDC を含む複数のベンチマークで最先端の結果を達成し、ターゲットデータなしのUDA 手法をも凌駕する場合がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。