QUICK REVIEW

[論文レビュー] Learning Equivariant Segmentation with Instance-Unique Querying

Wenguan Wang, James Liang|arXiv (Cornell University)|Oct 3, 2022

Colorectal Cancer Screening and Detection被引用数 22

ひとこと要約

この論文は、データセットレベルのインスタンスの一意性を強制し、クエリ埋め込みと特徴の変換不変学習を通じて、クエリベースのインスタンスセグメンテーションを強化するトレーニングフレームワークを提案し、推論を変更せずに有意なAPの向上を達成する。

ABSTRACT

Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings. In this work, we devise a new training framework that boosts query-based models through discriminative query embedding learning. It explores two essential properties, namely dataset-level uniqueness and transformation equivariance, of the relation between queries and instances. First, our algorithm uses the queries to retrieve the corresponding instances from the whole training dataset, instead of only searching within individual scenes. As querying instances across scenes is more challenging, the segmenters are forced to learn more discriminative queries for effective instance separation. Second, our algorithm encourages both image (instance) representations and queries to be equivariant against geometric transformations, leading to more robust, instance-query matching. On top of four famous, query-based models ($i.e.,$ CondInst, SOLOv2, SOTR, and Mask2Former), our training algorithm provides significant performance gains ($e.g.,$ +1.6 - 3.2 AP) on COCO dataset. In addition, our algorithm promotes the performance of SOLOv2 by 2.7 AP, on LVISv1 dataset.

研究の動機と目的

同一シーン内での訓練を超えるインスタンスクエリの識別力の向上を動機づける。
全データセットのインスタンス識別を促進し、データセット内の全インスタンスを識別できるようにする。
クエリと特徴が幾何学的変化に対して頑健になるよう、変換等価性を強制する。
等価正則化がアーキテクチャの変更や推論の遅延なしに利得をもたらすことを実証する。

提案手法

密な特徴抽出器 f を定義して画像埋め込み I を生成し、クエリ生成器 h を用いて N 個のインスタンス意識的クエリ {q_n} を生成する。
同一シーン内マスク損失 L_intra_mask で訓練し、外部メモリとスパースでインスタンスバランス抽出を用いて他の画像でクエリが不一致になることを強制する inter-scene mask loss L_inter_mask を導入する。
f(g(I)) ≈ g(f(I)) を強制する等変性損失 L_equi を追加し、{q_n^g, I^g} が変換された真のマスク g(M_sigma(n)) に一致するようにする。
L_intra_mask、L_inter_mask、および L_equiを組み合わせて、既存のクエリベース手法に組み込めるクロスシーンの変換等変性トレーニング目的を形成する。
多くの容易なネガティブを扱うため L_inter_mask には focal loss を、L_equi にはベース手法に応じて dice/focal loss の組み合わせを用いる。

実験結果

リサーチクエスチョン

RQ1クロスシーン（データセット全体）でのクエリ付けが、同一シーン内の訓練を超えてインスタンスクエリの識別性を改善できるか？
RQ2特徴とクエリに対して変換等価性を強制することで、標準的なデータ拡張よりもロバストなインスタンスクエリ一致が得られるか？
RQ3提案フレームワークを既存のクエリベースモデルに適用した場合、COCOとLVISでどの程度のAPの利得が得られるか？
RQ4提案するトレーニングフレームワークは、主流のクエリベースセグメンテータに対してアーキテクチャ依存性や推論速度の影響を持たない設計になっているか？

主な発見

Method	Backbone	#Epoch	AP	AP50	AP75	AP_S	AP_M	AP_L
Mask R-CNN	ResNet-101	12	36.1	57.5	38.6	18.8	39.7	49.5
Cascade Mask R-CNN	ResNet-101	12	37.3	58.2	40.1	19.7	40.6	51.5
HTC	ResNet-101	20	39.6	61.0	42.8	21.3	42.9	55.0
Point Rend	ResNet-50	12	36.3	56.9	38.7	19.8	39.4	48.5
QueryInst	ResNet-101	36	41.0	63.3	44.5	21.7	44.4	60.7
K-Net	ResNet-101	36	40.1	62.8	43.1	18.7	42.7	58.8
SOLQ	Swin-L	50	46.7	72.7	50.6	29.2	50.1	60.9
SparseInst	ResNet-50	36	37.9	59.2	40.2	15.7	39.4	56.9
CondInst	ResNet-50	12	35.5	55.8	37.7	16.8	39.2	50.6
Ours	ResNet-50	-	38.6	61.1	41.2	19.7	41.1	54.7
CondInst	ResNet-101	37.1	58.6	62.7	39.3	18.2	40.3	52.9
Ours	ResNet-101	-	39.9	62.7	42.4	20.8	42.3	55.7
SOTR	ResNet-50	24	42.2	61.9	43.9	11.0	60.5	73.5
SOTR	ResNet-101	40.2	42.6	64.1	45.8	11.2	61.2	75.3

CondInst、SOLOv2、SOTR、および Mask2Former をバックボーン（ResNet/Swin）で適用したところ、本手法は COCO で AP を +1.6 〜 +3.2、SOLOv2 の LVISv1 で +2.7 AP の改善を達成。
COCO test-dev での報告された利得には、特定の設定で最大で +3.2 AP、方法間で AP_S、AP_M、AP_L の顕著な改善（例：CondInst および SOTR バリアント）を含む。
SOTR-Res50 で AP 42.2、AP50 61.9、AP75 43.9、AP_S 11.0、AP_M 60.5、AP_L 73.5、Res101 で AP 42.6、AP50 64.1、AP75 45.8、AP_S 11.2、AP_M 61.2、AP_L 75.3。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。