QUICK REVIEW

[论文解读] Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

Rui Gong, Martin Danelljan|arXiv (Cornell University)|Jul 5, 2023

Domain Adaptation and Few-Shot Learning被引用 8

一句话总结

本论文表明扩散模型预训练的表示在语义分割的跨域泛化方面具有卓越表现，并引入基于提示的方法（场景提示和类别提示、提示随机化）以及测试时提示微调，以进一步提升 DG 与 TTDA 的性能。

ABSTRACT

While originally designed for image generation, diffusion models have recently shown to provide excellent pretrained feature representations for semantic segmentation. Intrigued by this result, we set out to explore how well diffusion-pretrained representations generalize to new domains, a crucial ability for any representation. We find that diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation, outperforming both supervised and self-supervised backbone networks. Motivated by this, we investigate how to utilize the model's unique ability of taking an input prompt, in order to further enhance its cross-domain performance. We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head. Moreover, we propose a simple but highly effective approach for test-time domain adaptation, based on learning a scene prompt on the target domain in an unsupervised manner. Extensive experiments conducted on four synthetic-to-real and clear-to-adverse weather benchmarks demonstrate the effectiveness of our approaches. Without resorting to any complex techniques, such as image translation, augmentation, or rare-class sampling, we set a new state-of-the-art on all benchmarks. Our implementation will be publicly available at \url{https://github.com/ETHRuiGong/PTDiffSeg}.

研究动机与目标

评估扩散-预训练骨干在未见域中对语义分割的泛化能力。
探究提示条件是否能将域不变特征与域变异线索解耦。
提出场景提示与类别提示并结合提示随机化以提升域泛化。
开发基于提示微调的测试时域自适应方法，利用无标签目标数据。

提出的方法

冻结一个扩散-预训练骨干（Stable Diffusion）并训练一个语义投影头。
引入类别提示（类别标记）和场景提示（域/风格线索）作为条件输入，以实现特征解耦。
通过对多个场景提示强制预测一致性并引入基于 KL 散度的损失来应用提示随机化。
使用一个联合损失：语义分割损失加上一个一致性损失，在多提示下进行训练。
对于 TTDA，仅微调场景提示，借助伪标签目标函数来适应目标域。

实验结果

研究问题

RQ1扩散预训练与有监督和自监督骨干在跨域语义分割中的表现有何差异？
RQ2提示条件（类别提示和场景提示）是否能提升域泛化？
RQ3提示随机化是否进一步解耦域不变表征并提升鲁棒性？
RQ4测试时提示微调是否能实现对无标签目标域的高效自适应？

主要发现

扩散预训练的骨干在 GTA→Cityscapes 等场景上相较于 ImageNet 监督、自监督及 CLIP 骨干，显示出更优的域泛化能力。
类别提示与场景提示使模型能够将域不变的语义与域变异的风格解耦，从而提升泛化。
提示随机化在不同场景提示下实现了一致性预测，并在合成到真实、清晰到恶劣等基准上超越基线。
在测试时对场景提示进行提示微调，提供了参数高效的 TTDA 增益，超过了若干 TTDA 基线。
结合提示的域泛化方法在多个基准上达到最先进水平，包括 Cityscapes→ACDC，甚至在无目标数据的情况下超越部分无目标域适应方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。