QUICK REVIEW

[論文レビュー] DifFSS: Diffusion Model for Few-Shot Semantic Segmentation

Weimin Tan, Siyuan Chen|arXiv (Cornell University)|Jul 3, 2023

Domain Adaptation and Few-Shot Learning被引用数 13

ひとこと要約

DifFSSは拡散モデルに基づくパラダムを導入し、few-shot semantic segmentationのための多様な補助画像を生成し、既存のFSSモデルのアーキテクチャを変更せずに性能を向上させる。

ABSTRACT

Diffusion models have demonstrated excellent performance in image generation. Although various few-shot semantic segmentation (FSS) models with different network structures have been proposed, performance improvement has reached a bottleneck. This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure. Specifically, we utilize the powerful generation ability of diffusion models to generate diverse auxiliary support images by using the semantic mask, scribble or soft HED boundary of the support image as control conditions. This generation process simulates the variety within the class of the query image, such as color, texture variation, lighting, $etc$. As a result, FSS models can refer to more diverse support images, yielding more robust representations, thereby achieving a consistent improvement in segmentation performance. Extensive experiments on three publicly available datasets based on existing advanced FSS models demonstrate the effectiveness of the diffusion model for FSS task. Furthermore, we explore in detail the impact of different input settings of the diffusion model on segmentation performance. Hopefully, this completely new paradigm will bring inspiration to the study of FSS task integrated with AI-generated content. Code is available at https://github.com/TrinitialChan/DifFSS

研究の動機と目的

few-shot segmentationにおける単一・少数のサポート画像の限界を、生成された多様な画像でサポートセットを充実させることで解消する。
条件付き拡散モデルを活用して、同一クラス内の変動性（色、質感、照明、ポーズ）を捉え、頑健なクエリ分割を実現する。
さまざまな拡散入力条件が分割性能に与える影響を調査する。
既存のFSSアーキテクチャとDifFSSを統合した際の適合性と性能向上を示す。
X-shotへの拡張を検討し、生成ドリフトとサポート品質の感度について議論する。

提案手法

Support画像とその分割マスクを条件として拡散モデルを用い、ControlNet with Stable Diffusionで補助サポート画像を生成する。
Support画像からエッジ/境界マップとスクリブル（HEDエッジ検出から派生）を用いて制御条件を作成する。
「a real shot photo of {class name}」のようなプロンプトを用いて画像生成を誘導し、サポート画像と同じ分割マスクを共有する。
生成された補助画像 I^G を標準FSSモデル f_seg に、元のサポート I^s と M^s とともに入力し、クエリマスク M^q_hat を予測する。
拡散モデルのパラメータを固定したまま、クロスエントロピー損失で f_seg を訓練する。
補助サンプル数を増やしてX-shotへの拡張を検討し、I^G と M^s の間で物体位置がずれる可能性がある生成ドリフトに対処する。

実験結果

リサーチクエスチョン

RQ1サポート分割データを条件として拡散モデルを制御すると、FSSの精度を向上させる多様で意味的に一貫した補助画像が得られるか。
RQ2異なる拡散入力条件（分割マップ、境界マップ、スクリブル）が分割性能にどう影響するか。
RQ3DifFSSアプローチはK-shotから自然にX-shotへ拡張できるか、生成ドリフトによる限界は何か。
RQ4生成された補助サンプルがFSSモデルの頑健性と同一クラス内の変動表現に与える影響は何か。

主な発見

DiffFSSは、既存モデルと組み合わせた場合、PASCAL-5i、FSS-1000、MiniCOCO-20i のベンチマーク全体で最先端のFSS手法を一貫して改善する。
拡散生成補助画像を使用すると、ベースライン手法（例：BAM、HDMNet）でデータセット全体にわたって顕著なmIoUの向上を得る。
生成される補助画像の数が多いほど一般に性能が向上するが、サポート品質が低い場合は生成ドリフトによりその利点が失われることがある。
制御条件（分割マップ、境界、スクリブル）はすべて性能向上をもたらし、拡散増強は場合によって真の5-shot設定より追加の利得をもたらす。
生成画像のプロトタイプ分布は元画像の周りにクラスタ化し、生成画像が意味的一貫性を保ちつつ同一クラス内の多様性を拡張していることを示す。
X-shot（より多くの補助画像）への拡張はmIoUの改善を示し、DifFSSのスケーラビリティを実証している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。