QUICK REVIEW

[논문 리뷰] IDRNet: Intervention-Driven Relation Network for Semantic Segmentation

Zhenchao Jin, Xiaowei Hu|arXiv (Cornell University)|2023. 10. 16.

Multimodal Machine Learning Applications인용 수 14

한 줄 요약

IDRNet은 제거 진단을 통해 시맨틱 수준의 관계를 형성하고 픽셀 표현을 보강하는 개입 주도 파라다임을 도입하여, 경량이면서도 호환 가능한 모듈로 다수의 벤치마크에서 분할 성능을 향상시킵니다.

ABSTRACT

Co-occurrent visual patterns suggest that pixel relation modeling facilitates dense prediction tasks, which inspires the development of numerous context modeling paradigms, \emph{e.g.}, multi-scale-driven and similarity-driven context schemes. Despite the impressive results, these existing paradigms often suffer from inadequate or ineffective contextual information aggregation due to reliance on large amounts of predetermined priors. To alleviate the issues, we propose a novel extbf{I}ntervention- extbf{D}riven extbf{R}elation extbf{Net}work ( extbf{IDRNet}), which leverages a deletion diagnostics procedure to guide the modeling of contextual relations among different pixels. Specifically, we first group pixel-level representations into semantic-level representations with the guidance of pseudo labels and further improve the distinguishability of the grouped representations with a feature enhancement module. Next, a deletion diagnostics procedure is conducted to model relations of these semantic-level representations via perceiving the network outputs and the extracted relations are utilized to guide the semantic-level representations to interact with each other. Finally, the interacted representations are utilized to augment original pixel-level representations for final predictions. Extensive experiments are conducted to validate the effectiveness of IDRNet quantitatively and qualitatively. Notably, our intervention-driven context scheme brings consistent performance improvements to state-of-the-art segmentation frameworks and achieves competitive results on popular benchmark datasets, including ADE20K, COCO-Stuff, PASCAL-Context, LIP, and Cityscapes. Code is available at \url{https://github.com/SegmentationBLWX/sssegmentation}.

연구 동기 및 목표

선정된 priors에 의존하는 의미 분할의 기존 컨텍스트 모듈의 한계를 동기 부여하고 해결한다.
픽셀 상호 작용을 안내하기 위해 시맨틱 수준의 관계를 모델링하는 개입 주도 패러다임을 제안한다.
향상된 분할을 위한 시맨틱 관계 행렬을 업데이트하는 삭제 진단 메커니즘을 개발한다.
인기 있는 분할 백본(backbones)과 프레임워크와의 통합 시 호환성 및 성능 향상을 입증한다.

제안 방법

가짜 라벨(pseudo labels)을 사용하여 픽셀 수준 특징을 시맨틱 수준 표현으로 그룹화한다.
판별적 특징 향상 모듈로 시맨틱 수준 특징을 강화한다.
삭제 진단을 통해 시맨틱 수준 관계 행렬을 구성하고 업데이트하여 클래스 간 상호 작용을 가능하게 한다.
시맨틱 수준 표현을 상호 작용시켜 픽셀 표현을 보강하는 향상된 특징을 생성한다.
향상된 특징을 원래의 픽셀 표현과 융합하고 최종 예측 이전에 셀프 어텐션을 적용한다.
가짜 라벨과 최종 예측에 대한 교차 엔트로피 손실을 결합한 공동 목적 함수로 학습한다.

Figure 1: Diagram of our intervention-driven relation network. Deletion diagnostics is leveraged to build relations between semantic-level representations. With the built relation matrix and semantic-level representations, pixel representations can be augmented for pixel prediction.

실험 결과

연구 질문

RQ1삭제 진단이 시맨틱 수준의 상호 작용에 초점을 맞춰 픽셀 관계 구성을 효과적으로 안내할 수 있는가?
RQ2개입 주도 컨텍스트 스킴이 다양한 데이터셋과 백본 전반에서 분할 정확도를 일관되게 향상시키는가?
RQ3FCN, PSPNet, DeeplabV3, UPerNet 등 기존 프레임워크에 통합되었을 때 정확도와 효율성 면에서 IDRNet의 성능은 어떠한가?
RQ4시맨틱 수준 관계 접근법이 교차 도메인 분할 작업에 대해 강건한가?

주요 결과

Context Module	Parameters	FLOPS	Time	GPU Memory	mIoU (%) ADE20K (train/val)
OCR	15.12M	242.48G	16.58ms	617.24M	42.47
ASPP	--	674.47G	41.98ms	976.06M	43.19
PPM	23.07M	309.45G	21.45ms	960.63M	42.64
UPerNet	34.75M	500.76G	36.51ms	1429.18M	43.02
ANN	22.42M	369.62G	26.58ms	1445.75M	41.75
CCNet	23.92M	397.38G	30.92ms	986.28M	42.48
DNL	24.12M	395.25G	51.38ms	2381.04M	43.50
IDRNet	10.79M	155.89G	20.52ms	365.66M	43.61
PPM+IDRNet	23.65M	349.23G	32.64ms	1034.28M	44.02

IDRNet 및 그 변형인 IDRNet+은 인기 있는 벤치마크(ADE20K, Cityscapes, COCO-Stuff, LIP, PASCAL-Context)에서 일관된 성능 향상을 달성한다.
ADE20K에서 IDRNet이 베이스라인 백본과 함께 여러 컨텍스트 스킴과 비교해 의미 있는 mIoU 향상을 보이며(예: ablation에서 IDRNet 단독이 ADE20K에서 43.61% mIoU에 도달; 백본으로서 UPerNet 같은 조합이 상당한 이득을 보임).
제안된 방법은 비교적 경량의 컨텍스트 모듈(IDRNet은 다른 대다수의 모듈보다 파라미터 수와 FLOPS가 적음; 예: 10.79M 파라미터, 155.89G FLOPS, 20.52ms 시간, 365.66M GPU 메모리, ADE20K에서 43.61% mIoU)로 경쟁력 또는 우수한 결과를 산출한다.
삭제 진단은 관계 행렬 업데이트에서 역전파 기반 M_r 업데이트보다 성능이 우수하다(BD/DD 주도 업데이트가 개선을 보임; 예: DD 주도 M_r이 BP 주도보다 ADE20K mIoU를 3.26% 포인트 향상).
균형 있는 삭제는 드문 카테고리의 샘플링을 늘려 ADE20K, PASCAL-Context, COCO-Stuff와 같은 데이터셋의 성능을 향상시킨다.
교차 도메인 성능 개선이 관찰되며, 예를 들어 Cityscapes에서 학습된 DeeplabV3+IDRNet을 Dark Zurich 및 Nighttime Driving으로 전이할 때 각각 3.63 및 1.94 mIoU의 향상을 보인다.

Figure 2: Illustration of our intervention-driven relation network (IDRNet). We first extract pixel representations $R_{p}$ using a backbone network $\mathcal{F}_{B}$ , e.g. , ResNet [ 30 ] or SwinTransformer [ 15 ] . Then, $R_{p}$ is grouped into semantic-level representations $R_{sl}$ based on a c

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.