QUICK REVIEW

[논문 리뷰] Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

Qibin Hou, Li Zhang|arXiv (Cornell University)|2020. 03. 30.

Advanced Neural Network Applications참고 문헌 65인용 수 43

한 줄 요약

스트립 풀링(1xN 또는 Nx1 커널)과 두 모듈(Strip Pooling Module과 Mixed Pooling Module)을 도입하여 장거리 및 다양한 맥락 정보를 포착하고, ADE20K, Cityscapes, Pascal Context에서 최첨단 결과를 달성한다.

ABSTRACT

Spatial pooling has been proven highly effective in capturing long-range contextual information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond conventional spatial pooling that usually has a regular shape of NxN, we rethink the formulation of spatial pooling by introducing a new pooling strategy, called strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1. Based on strip pooling, we further investigate spatial pooling architecture design by 1) introducing a new strip pooling module that enables backbone networks to efficiently model long-range dependencies, 2) presenting a novel building block with diverse spatial pooling as a core, and 3) systematically comparing the performance of the proposed strip pooling and conventional spatial pooling techniques. Both novel pooling-based designs are lightweight and can serve as an efficient plug-and-play module in existing scene parsing networks. Extensive experiments on popular benchmarks (e.g., ADE20K and Cityscapes) demonstrate that our simple approach establishes new state-of-the-art results. Code is made available at https://github.com/Andrew-Qibin/SPNet.

연구 동기 및 목표

정사각형 풀링 형태를 넘어서는 픽셀 단위 장면 파싱을 위한 향상된 장거리 컨텍스트 모델링을 고무한다.
좁은 커널로 장거리 의존성을 포착하기 위한 strip pooling을 제안한다.
백본에 플러그인 가능한 경량 모듈(SPM 및 MPM)을 설계하여 세분화를 향상시킨다.

제안 방법

strip pooling을 행 또는 열 각각에서 평균화하는 방식(1xW 또는 Hx1 창)을 정의하여 장거리 밴드 모양 컨텍스트를 형성한다.
가로 및 세로 Strip Pooling 경로를 포함하고 그 뒤에 1D 합성곱과 시그모이드 가이드 스케일 연산으로 특징을 융합하는 Strip Pooling Module (SPM)을 개발한다.
잔여 병목 구조에서 짧은 범위의 피라미드형 풀링 경로와 긴 범위의 strip-pooling 경로를 결합한 Mixed Pooling Module (MPM)을 도입한다.
SPNet을 백본에 SPM을 통합하고 ResNet 백본 위에 MPM을 쌓아 세분화를 위한 특징을 정제한다.
기존의 장면 파싱 네트워크에 추가할 수 있는 경량의 플러그앤 플레이 설계를 제공한다.

실험 결과

연구 질문

RQ1스트립 풀링이 전통적인 정사각형 풀링과 비교했을 때 장거리 맥락 의존성을 포착하는 데 어떤 이점을 제공하는가?
RQ2경량화된 SPM 및 MPM 블록이 표준 벤치마크에서 매개변수 오버헤드를 크게 늘리지 않으면서 정확도를 개선할 수 있는가?
RQ3짧은 범위 풀링 전략과 긴 범위 풀링 전략을 결합했을 때 세분화 성능에 미치는 영향은 무엇인가?

주요 결과

Model	Backbone	mIoU	픽셀 정확도
Base FCN	ResNet-50	37.63	77.60%
Base FCN + 2 MPM (SRD + LRD)	ResNet-50	41.92	80.03%
Base FCN + 2 MPM + SPM	ResNet-50	44.03	80.65%
SPNet (Ours)	ResNet-50	45.03	81.32%

SPNet with 2 MPMs and SPM achieves 44.03% mIoU on ADE20K with ResNet-50 backbone (pixel acc. 80.65%).
With ResNet-101, SPNet achieves 45.60% mIoU and 82.09% pixel accuracy on ADE20K (single-model test).
On Cityscapes test set, SPNet with ResNet-101 reaches 82.0% mIoU, outperforming several prior methods.
Ablation shows that combining both SRD (short-range dependency) and LRD (long-range dependency) in MPM yields better mIoU than either alone, and that SPM provides substantial gains when placed strategically in the backbone.
Strip pooling outperforms global average pooling in the SPNet setup, with 44.03% mIoU vs. 41.34% when replacing strip pooling with GAP on ADE20K.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.