QUICK REVIEW

[논문 리뷰] Learning Equivariant Segmentation with Instance-Unique Querying

Wenguan Wang, James Liang|arXiv (Cornell University)|2022. 10. 03.

Colorectal Cancer Screening and Detection인용 수 22

한 줄 요약

논문은 쿼리 기반 인스턴스 분할을 향상시키기 위해 데이터셋 수준의 인스턴스 고유성 및 쿼리 임베딩과 특징의 변환-불변 학습을 강제하는 학습 프레임워크를 제시하며, 추론 비용의 변화 없이 AP를 크게 향상시킵니다.

ABSTRACT

Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings. In this work, we devise a new training framework that boosts query-based models through discriminative query embedding learning. It explores two essential properties, namely dataset-level uniqueness and transformation equivariance, of the relation between queries and instances. First, our algorithm uses the queries to retrieve the corresponding instances from the whole training dataset, instead of only searching within individual scenes. As querying instances across scenes is more challenging, the segmenters are forced to learn more discriminative queries for effective instance separation. Second, our algorithm encourages both image (instance) representations and queries to be equivariant against geometric transformations, leading to more robust, instance-query matching. On top of four famous, query-based models ($i.e.,$ CondInst, SOLOv2, SOTR, and Mask2Former), our training algorithm provides significant performance gains ($e.g.,$ +1.6 - 3.2 AP) on COCO dataset. In addition, our algorithm promotes the performance of SOLOv2 by 2.7 AP, on LVISv1 dataset.

연구 동기 및 목표

인스턴스 쿼리의 판별력을 장면 간 학습(intra-scene training) 외적으로 향상시키려는 동기를 부여합니다.
데이터셋 전반에 걸친 교차-씬 인스턴스 구별을 촉진하여 모든 데이터셋 인스턴스를 구별합니다.
기하학적 변화에 강건하도록 쿼리와 특징의 변환-불변성을 강제합니다.
변환-불변 정규화가 아키텍처 변경이나 느려진 추론 없이 이득을 가져다 준다는 것을 입증합니다.

제안 방법

밀집 특징 추출기 f를 정의하여 이미지 임베딩 I를 생성하고 쿼리 생성기 h를 만들어 N개의 인스턴스 인식 쿼리 {q_n}를 생성합니다.
intra-scene 마스크 손실 L_intra_mask로 학습하고, 외부 메모리와 희소하며 인스턴스-균형 샘플링을 사용하여 쿼리가 다른 이미지에서 불일치하도록 강제하는 inter-scene 마스크 손실 L_inter_mask를 도입합니다.
f(g(I)) ≈ g(f(I)) 및 {q_n^g, I^g}가 변환된 정답 마스크 g(M_sigma(n))와 정렬되도록 강제하는 변환-불변 손실 L_equi를 추가합니다.
L_intra_mask, L_inter_mask, L_equi를 결합하여 기존의 쿼리 기반 방법에 plug-in 할 수 있는 교차-씬 및 변환-불변 학습 목표를 형성합니다.
L_inter_mask를 처리하기 위해 focal loss를 사용하고, L_equi는 기본 방법에 따라 dice/focal 손실의 조합으로 적용합니다.

실험 결과

연구 질문

RQ1교차-씬(데이터셋 수준) 쿼링이 intra-scene 학습을 넘어 인스턴스 쿼리의 판별력을 향상시킬 수 있는가?
RQ2특징과 쿼리에 대한 변환-불변성 강제가 표준 증강보다 더 견고한 인스턴스-쿼리 매칭으로 이어지는가?
RQ3제안된 프레임워크를 기존의 쿼리 기반 모델에 적용했을 때 COCO와 LVIS에서 어떤 AP 이득이 나타나는가?
RQ4제안된 학습 프레임워크가 주류의 쿼리 기반 분할기에서 아키텍처 및 추론 속도에 무관한가?

주요 결과

Method	Backbone	#Epoch	AP	AP50	AP75	AP_S	AP_M	AP_L
Mask R-CNN	ResNet-101	12	36.1	57.5	38.6	18.8	39.7	49.5
Cascade Mask R-CNN	ResNet-101	12	37.3	58.2	40.1	19.7	40.6	51.5
HTC	ResNet-101	20	39.6	61.0	42.8	21.3	42.9	55.0
Point Rend	ResNet-50	12	36.3	56.9	38.7	19.8	39.4	48.5
QueryInst	ResNet-101	36	41.0	63.3	44.5	21.7	44.4	60.7
K-Net	ResNet-101	36	40.1	62.8	43.1	18.7	42.7	58.8
SOLQ	Swin-L	50	46.7	72.7	50.6	29.2	50.1	60.9
SparseInst	ResNet-50	36	37.9	59.2	40.2	15.7	39.4	56.9
CondInst	ResNet-50	12	35.5	55.8	37.7	16.8	39.2	50.6
Ours	ResNet-50	-	38.6	61.1	41.2	19.7	41.1	54.7
CondInst	ResNet-101	37.1	58.6	62.7	39.3	18.2	40.3	52.9
Ours	ResNet-101	-	39.9	62.7	42.4	20.8	42.3	55.7
SOTR	ResNet-50	24	42.2	61.9	43.9	11.0	60.5	73.5
SOTR	ResNet-101	40.2	42.6	64.1	45.8	11.2	61.2	75.3

ConInst, SOLOv2, SOTR, 및 Mask2Former를 백본(ResNet/Swin) 전반에 적용했을 때 이 방법은 COCO에서 AP를 +1.6에서 +3.2, SOLOv2의 LVISv1에서 AP를 +2.7로 상승시켰습니다.
COCO test-dev에서 특정 설정에 대해 최대 +3.2 AP를 포함한 이득 및 방법들(CondInst 및 SOTR 변형 등)에서 AP_S, AP_M, AP_L의 눈에 띄는 개선이 보고됩니다.
SOTR-Res50를 적용한 ours는 AP 42.2, AP50 61.9, AP75 43.9, AP_S 11.0, AP_M 60.5, AP_L 73.5를 달성; Res101에서 AP 42.6, AP50 64.1, AP75 45.8, AP_S 11.2, AP_M 61.2, AP_L 75.3를 달성합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.