QUICK REVIEW

[논문 리뷰] Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Zhen-Liang Ni, Xinghao Chen|arXiv (Cornell University)|2024. 05. 10.

Advanced Image and Video Retrieval Techniques인용 수 6

한 줄 요약

CGRSeg는 Rectangular Self-Calibration Module과 Dynamic Prototype Guided head를 도입하여 효율적이고 피라미드 맥락의 의미 분할을 달성하며, 4.0 GFLOPs에서 ADE20K에서 43.6% mIoU를 달성한다.

ABSTRACT

Semantic segmentation is an important task for numerous applications but it is still quite challenging to achieve advanced performance with limited computational costs. In this paper, we present CGRSeg, an efficient yet competitive segmentation framework based on context-guided spatial feature reconstruction. A Rectangular Self-Calibration Module is carefully designed for spatial feature reconstruction and pyramid context extraction. It captures the axial global context in both horizontal and vertical directions to explicitly model rectangular key areas. A shape self-calibration function is designed to make the key areas closer to foreground objects. Besides, a lightweight Dynamic Prototype Guided head is proposed to improve the classification of foreground objects by explicit class embedding. Our CGRSeg is extensively evaluated on ADE20K, COCO-Stuff, and Pascal Context benchmarks, and achieves state-of-the-art semantic performance. Specifically, it achieves $43.6\%$ mIoU on ADE20K with only $4.0$ GFLOPs, which is $0.9\%$ and $2.5\%$ mIoU better than SeaFormer and SegNeXt but with about $38.0\%$ fewer GFLOPs. Code is available at https://github.com/nizhenliang/CGRSeg.

연구 동기 및 목표

제한된 계산 자원으로 효율적인 의미 분할을 동기 부여합니다.
전경 위치화와 피라미드 맥락 추출을 향상시키는 모듈을 설계합니다.
경계 구분 및 클래스 판별을 개선하는 경량화 구성요소를 개발합니다.
ADE20K, COCO-Stuff, 및 Pascal Context에서 FLOPs를 줄이며 최첨단 성능을 시연합니다.

제안 방법

피라미드 맥락 추출, 공간 특징 재구성 및 경량 헤드를 포함하는 프레임워크인 CGRSeg를 제안합니다.
가로 및 세로 풀링을 통해 축 global context를 포착하고 대형 커널 스트립 합성으로 형태 자체 보정을 수행하는 Rectangular Self-Calibration Module (RCM)을 도입합니다.
전경 특징과 주의 영역을 정렬하기 위한 형태 자체 보정 함수를 적용합니다.
로컬 디테일 강화 융합 경로를 사용하여 주의 특징과 입력 특징을 융합합니다.
클래스 정보를 임베딩하고 이미지 특성에 특화된 동적 프로토타입을 계산하는 Dynamic Prototype Guided (DPG) 헤드를 개발합니다.
피라미드 특징 상호 작용을 위해 쌓인 RCM을 활용하고 다운샘플된 다중 스케일 특징으로 피라미드 맥락(P)을 형성합니다.
디코더 특징과 클래스 임베딩을 투사하여 픽셀 수준 표현을 정교화하고 전경 분류를 개선합니다.

실험 결과

연구 질문

RQ1경량화된 분할 백본에서 전경 중심 맥락을 어떻게 효율적으로 모델링할 수 있는가?
RQ2직사각형 축 가이드 attention 메커니즘이 피라미드 맥락을 전통적 주의 블록보다 더 효과적으로 포착할 수 있는가?
RQ3동적 클래스 프로토타입이 큰 계산 부담 없이 픽셀당 판별을 향상시킬 수 있는가?
RQ4피라미드 맥락 추출과 공간 특징 재구성의 결합이 표준 벤치마크에 어떤 영향을 미치는가?

주요 결과

방법	mIoU	FLOPs(G)	Param(M)	처리량(Img/s)
DeeplabV3+ (ECCV’18)	34.0	69.4	15.4	63.0
Segformer-B0 (NeurIPS’21)	37.4	8.4	3.8	117.1
FeedFormer-B0 (AAAI’23)	39.2	7.8	4.5	110.3
SegNeXt-T (NeurIPS’22)	41.1	6.6	4.3	123.5
Seaformer-L (ICLR’23)	42.7	6.5	14.0	142.3
PEM-STDC1 (CVPR’24)	39.6	16.0	17.0	-
CGRSeg-T (Ours)	43.6	4.0	9.4	138.4
DeeplabV3+ ECCV’18	44.1	255.1	62.7	21.6
EncNet (CVPR’18)	44.7	218.8	68.6	23.4
CCNet (ICCV’19)	45.2	278.4	68.9	23.2
Segformer-B1 (NeurIPS’21)	42.2	15.9	13.7	96.0
SegNeXt-S (NeurIPS’22)	44.3	15.9	13.9	91.1
FeedFormer-B1 (AAAI’23)	41.0	10.0	4.6	87.2
PEM-STDC2 (CVPR’24)	45.0	19.3	21.0	-
CGRSeg-B (Ours)	45.5	7.6	18.1	98.4
Segformer-B2 (NeurIPS’21)	46.5	62.4	27.5	70.4
SegNeXt-B (NeurIPS’22)	47.7	74.0	63.0	-
FeedFormer-B2 (AAAI’23)	48.0	42.7	29.1	56.9
LRFormer-T (arXiv’23)	46.7	17.0	13.0	-
CGRSeg-L (Ours)	48.3	14.9	35.7	73.0

CGRSeg는 4.0 GFLOPs로 ADE20K에서 43.6% mIoU를 달성합니다(초소형 모델).
CGRSeg-T는 ADE20K에서 SeaFormer 및 SegNeXt보다 훨씬 적은 GFLOPs로 더 높은 성능을 보입니다(각각 0.9% 및 2.5%의 mIoU 이득).
CGRSeg-B 및 CGRSeg-L은 더 높은 mIoU(각각 45.5% 및 48.3%)를 달성하며 모델 간 FLOPs도 경쟁력 있습니다.
COCO-Stuff에서 CGRSeg-T는 4.0 GFLOPs로 42.2% mIoU를 달성하고 CGRSeg-L은 14.9 GFLOPs에서 46.0% mIoU에 도달합니다.
Pascal Context에서 CGRSeg-T는 54.1% mIoU(4.0 GFLOPs)를 달성하고 CGRSeg-L은 58.5% mIoU(14.9 GFLOPs)를 달성합니다.
Ablation 연구에서 RCM과 DPG 헤드의 기여가 가산적 이득을 보여 주었습니다: 기본은 40.86% mIoU이며, RCM(PCE) + RCM(SFR) + DPG Head를 추가하면 43.60% mIoU가 됩니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.