QUICK REVIEW

[論文レビュー] IDRNet: Intervention-Driven Relation Network for Semantic Segmentation

Zhenchao Jin, Xiaowei Hu|arXiv (Cornell University)|Oct 16, 2023

Multimodal Machine Learning Applications被引用数 14

ひとこと要約

IDRNet は介入駆動パラダイムを導入し、削除診断を通じて意味レベルの関係を構築し、ピクセル表現を拡張して、軽量で互換性のあるモジュールで複数のベンチマークでセグメンテーションを改善する。

ABSTRACT

Co-occurrent visual patterns suggest that pixel relation modeling facilitates dense prediction tasks, which inspires the development of numerous context modeling paradigms, \emph{e.g.}, multi-scale-driven and similarity-driven context schemes. Despite the impressive results, these existing paradigms often suffer from inadequate or ineffective contextual information aggregation due to reliance on large amounts of predetermined priors. To alleviate the issues, we propose a novel \textbf{I}ntervention-\textbf{D}riven \textbf{R}elation \textbf{Net}work (\textbf{IDRNet}), which leverages a deletion diagnostics procedure to guide the modeling of contextual relations among different pixels. Specifically, we first group pixel-level representations into semantic-level representations with the guidance of pseudo labels and further improve the distinguishability of the grouped representations with a feature enhancement module. Next, a deletion diagnostics procedure is conducted to model relations of these semantic-level representations via perceiving the network outputs and the extracted relations are utilized to guide the semantic-level representations to interact with each other. Finally, the interacted representations are utilized to augment original pixel-level representations for final predictions. Extensive experiments are conducted to validate the effectiveness of IDRNet quantitatively and qualitatively. Notably, our intervention-driven context scheme brings consistent performance improvements to state-of-the-art segmentation frameworks and achieves competitive results on popular benchmark datasets, including ADE20K, COCO-Stuff, PASCAL-Context, LIP, and Cityscapes. Code is available at \url{https://github.com/SegmentationBLWX/sssegmentation}.

研究の動機と目的

既存の意味セグメン테ーションの事前 priors に依存するコンテキストモジュールの限界を動機づけ、対処する。
ピクセル相互作用を導くために意味レベルの関係をモデル化する介入駆動パラダイムを提案する。
セマンティック関係行列を更新する削除診断メカニズムを開発し、セグメンテーションを改善する。
一般的なセグメンテーションバックボーンやフレームワークと統合した場合の互換性と性能向上を実証する。

提案手法

疑似ラベルを用いてピクセルレベルの特徴を意味レベルの表現にグルーピングする。
識別的な特徴強化モジュールで意味レベルの特徴を強化する。
削除診断を介して意味レベルの関係行列を構築・更新し、クラス間の相互作用を可能にする。
意味レベルの表現を相互作用させて、ピクセル表現を強化する特徴を生成する。
強化された特徴を元のピクセル表現と結合し、最終予測前に自己注意を適用する。
疑似ラベルと最終予測のクロスエントロピー損失を組み合わせたジョイント目的で訓練する。

Figure 1: Diagram of our intervention-driven relation network. Deletion diagnostics is leveraged to build relations between semantic-level representations. With the built relation matrix and semantic-level representations, pixel representations can be augmented for pixel prediction.

実験結果

リサーチクエスチョン

RQ1削除診断は意味レベルの相互作用に焦点を当てることでピクセル関係の構築を効果的に導けるのか。
RQ2介入駆動コンテキストスキームは多様なデータセットとバックボーンで一貫してセグメンテーション精度を向上させるのか。
RQ3IDRNet は FCN、PSPNet、DeeplabV3、UPerNet などの既存のフレームワークに統合した場合、精度と効率の観点でどのように機能するのか。
RQ4意味レベルの関係アプローチはクロスドメインのセグメンテーションタスクに対して頑健か。

主な発見

Context Module	Parameters	FLOPS	Time	GPU Memory	mIoU (%) ADE20K（train/val）
OCR	15.12M	242.48G	16.58ms	617.24M	42.47
ASPP	--	674.47G	41.98ms	976.06M	43.19
PPM	23.07M	309.45G	21.45ms	960.63M	42.64
UPerNet	34.75M	500.76G	36.51ms	1429.18M	43.02
ANN	22.42M	369.62G	26.58ms	1445.75M	41.75
CCNet	23.92M	397.38G	30.92ms	986.28M	42.48
DNL	24.12M	395.25G	51.38ms	2381.04M	43.50
IDRNet	10.79M	155.89G	20.52ms	365.66M	43.61
PPM+IDRNet	23.65M	349.23G	32.64ms	1034.28M	44.02

IDRNet およびその派生形 IDRNet+ は ADE20K、Cityscapes、COCO-Stuff、LIP、PASCAL-Context などの人気ベンチマークで一貫した性能向上を達成する。
ADE20K では、ベースラインバックボーンを用いた IDRNet は複数のコンテキストスキームと比較して顕著な mIoU の改善を示す（たとえばアブレーションでの IDRNet 単体は ADE20K で 43.61% mIoU に達し、UPerNet のようなバックボーンと組み合わせた IDRNet+ は大幅な改善を示す）。
本手法は比較的軽量なコンテキストモジュールを用いながら競争力のある、あるいは優れた結果を示す（IDRNet は多くの対向手より少ないパラメータと FLOPS を持つ；例えば 10.79M パラメータ、155.89G FLOPS、20.52ms、365.66M GPU メモリ、ADE20K で 43.61% mIoU）。
削除診断は関係行列を更新する際に逆伝播に基づく M_r 更新よりも優れており（BD/DD 主導の更新が改善を示す；例として DD 主導の M_r は BP 主導より ADE20K の mIoU を 3.26% 向上）。
バランスのとれた削除は珍しいカテゴリのサンプリングを増加させ、ADE20K、PASCAL-Context、COCO-Stuff などのデータセットでの性能を向上させる。
クロスドメインの改善が見られ、例として Cityscapes で訓練された DeeplabV3+IDRNet を Dark Zurich や Nighttime Driving に転移させた場合、mIoU がそれぞれ 3.63、1.94 増加する。

Figure 2: Illustration of our intervention-driven relation network (IDRNet). We first extract pixel representations $R_{p}$ using a backbone network $\mathcal{F}_{B}$ , e.g. , ResNet [ 30 ] or SwinTransformer [ 15 ] . Then, $R_{p}$ is grouped into semantic-level representations $R_{sl}$ based on a c

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。