QUICK REVIEW

[论文解读] Learning Equivariant Segmentation with Instance-Unique Querying

Wenguan Wang, James Liang|arXiv (Cornell University)|Oct 3, 2022

Colorectal Cancer Screening and Detection被引用 22

一句话总结

本文提出一个训练框架，通过强制数据集层面的实例唯一性和查询嵌入及特征的变换等变学习，提升基于查询的实例分割，在不改变推理的前提下获得显著的 AP 提升。

ABSTRACT

Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings. In this work, we devise a new training framework that boosts query-based models through discriminative query embedding learning. It explores two essential properties, namely dataset-level uniqueness and transformation equivariance, of the relation between queries and instances. First, our algorithm uses the queries to retrieve the corresponding instances from the whole training dataset, instead of only searching within individual scenes. As querying instances across scenes is more challenging, the segmenters are forced to learn more discriminative queries for effective instance separation. Second, our algorithm encourages both image (instance) representations and queries to be equivariant against geometric transformations, leading to more robust, instance-query matching. On top of four famous, query-based models ($i.e.,$ CondInst, SOLOv2, SOTR, and Mask2Former), our training algorithm provides significant performance gains ($e.g.,$ +1.6 - 3.2 AP) on COCO dataset. In addition, our algorithm promotes the performance of SOLOv2 by 2.7 AP, on LVISv1 dataset.

研究动机与目标

Motivate improving discriminative power of instance queries beyond intra-scene training.
Promote cross-scene (dataset-wide) instance discrimination to distinguish all dataset instances.
Enforce transformation equivariance to make queries and features robust to geometric changes.
Demonstrate that equivariance regularization yields gains without architectural changes or slower inference.

提出的方法

Define a dense feature extractor f to produce image embeddings I and a query creator h to generate N instance-aware queries {q_n}.
Train with intra-scene mask loss L_intra_mask and introduce inter-scene mask loss L_inter_mask that forces queries to mismatch in other images, using an external memory and sparse, instance-balanced sampling.
Add an equivariance loss L_equi that enforces f(g(I)) ≈ g(f(I)) and {q_n^g, I^g} to align with transformed ground-truth masks g(M_sigma(n)).
Combine L_intra_mask, L_inter_mask, and L_equi to form a cross-scene, transformation-equivariant training objective that can be plugged into existing query-based methods.
Use focal loss for L_inter_mask to handle many easy negatives, and a combination of dice/focal losses for L_equi depending on base method.

实验结果

研究问题

RQ1Can cross-scene (dataset-level) querying improve discriminability of instance queries beyond intra-scene training?
RQ2Does enforcing transformation equivariance on features and queries lead to more robust instance-query matching than standard augmentation?
RQ3What gains in AP can be achieved on COCO and LVIS when applying the proposed framework to existing query-based models?
RQ4Is the proposed training framework architecture- and inference-speed-agnostic for mainstream query-based segmenters?

主要发现

方法	骨干网络	#Epoch	AP	AP50	AP75	AP_S	AP_M	AP_L
Mask R-CNN	ResNet-101	12	36.1	57.5	38.6	18.8	39.7	49.5
Cascade Mask R-CNN	ResNet-101	12	37.3	58.2	40.1	19.7	40.6	51.5
HTC	ResNet-101	20	39.6	61.0	42.8	21.3	42.9	55.0
Point Rend	ResNet-50	12	36.3	56.9	38.7	19.8	39.4	48.5
QueryInst	ResNet-101	36	41.0	63.3	44.5	21.7	44.4	60.7
K-Net	ResNet-101	36	40.1	62.8	43.1	18.7	42.7	58.8
SOLQ	Swin-L	50	46.7	72.7	50.6	29.2	50.1	60.9
SparseInst	ResNet-50	36	37.9	59.2	40.2	15.7	39.4	56.9
CondInst	ResNet-50	12	35.5	55.8	37.7	16.8	39.2	50.6
Ours	ResNet-50	-	38.6	61.1	41.2	19.7	41.1	54.7
CondInst	ResNet-101	37.1	58.6	62.7	39.3	18.2	40.3	52.9
Ours	ResNet-101	-	39.9	62.7	42.4	20.8	42.3	55.7
SOTR	ResNet-50	24	42.2	61.9	43.9	11.0	60.5	73.5
SOTR	ResNet-101	40.2	42.6	64.1	45.8	11.2	61.2	75.3

Applied to CondInst, SOLOv2, SOTR, and Mask2Former across backbones (ResNet/Swin), the method yields AP gains of +1.6 to +3.2 on COCO and +2.7 AP on LVISv1 for SOLOv2.
On COCO test-dev, reported gains include up to +3.2 AP for certain setups and notable improvements in AP_S, AP_M, and AP_L across methods (e.g., CondInst and SOTR variants).
Ours with SOTR-Res50 achieves AP 42.2, AP50 61.9, AP75 43.9, AP_S 11.0, AP_M 60.5, AP_L 73.5; with Res101 yields AP 42.6, AP50 64.1, AP75 45.8, AP_S 11.2, AP_M 61.2, AP_L 75.3.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。