QUICK REVIEW

[논문 리뷰] Guided Upsampling Network for Real-Time Semantic Segmentation

Davide Mazzini|arXiv (Cornell University)|2018. 07. 19.

Advanced Neural Network Applications참고 문헌 19인용 수 41

한 줄 요약

실시간 시맨틱 분할을 위한 다중 해상도 인코더-디코더 내에 Guided Upsampling Module(GUM)을 도입하여 Cityscapes에서 포스트 프로세싱 없이 33.3 FPS로 70.4% mIoU를 달성한다.

ABSTRACT

Semantic segmentation architectures are mainly built upon an encoder-decoder structure. These models perform subsequent downsampling operations in the encoder. Since operations on high-resolution activation maps are computationally expensive, usually the decoder produces output segmentation maps by upsampling with parameters-free operators like bilinear or nearest-neighbor. We propose a Neural Network named Guided Upsampling Network which consists of a multiresolution architecture that jointly exploits high-resolution and large context information. Then we introduce a new module named Guided Upsampling Module (GUM) that enriches upsampling operators by introducing a learnable transformation for semantic maps. It can be plugged into any existing encoder-decoder architecture with little modifications and low additional computation cost. We show with quantitative and qualitative experiments how our network benefits from the use of GUM module. A comprehensive set of experiments on the publicly available Cityscapes dataset demonstrates that Guided Upsampling Network can efficiently process high-resolution images in real-time while attaining state-of-the art performances.

연구 동기 및 목표

도시 거리 장면의 시맨틱 분할에서 추론 속도와 정확도 사이의 균형을 다룬다.
향상된 업샘플링 연산자를 갖춘 경량의 실시간 친화적 디코더를 제안한다.
다중 해상도 아키텍처를 통해 고해상도 디테일과 큰 맥락 정보를 활용한다.
픽셀 단위 업샘플링을 제어하는 학습 가능한 Guided Upsampling Module을 도입한다.
Cityscapes에서 경쟁력 있는 정확도로 실시간 성능을 시연한다.

제안 방법

맥락과 세부 정보를 포착하기 위한 저해상도 및 중간해상도 브랜치를 갖는 다중 해상도 인코더를 개발한다.
업샘플링 중 샘플링을 제어하기 위해 Guidance Offset Table을 사용하는 Guided Upsampling Module(GUM)을 도입한다.
Upsampling 그리드의 오프셋을 예측하기 위해 변형들(large-rf, high-res, fusion)을 갖는 Guidance Module을 설계한다.
BN 통계를 정규화하기 위해 모멘텀을 갖춘 SGD, 고정된 학습 스케줄, 배치 크기 8로 학습한다.
추론 속도에 영향을 주지 않으면서 일반화를 향상시키기 위해 데이터 증강(무작위 스케일, 색/조명 왜곡)을 탐구한다.

실험 결과

연구 질문

RQ1학습 가능한 업샘플링 변환이 실시간 속도를 손해보지 않고 경계 정확도를 향상시킬 수 있는가?
RQ2Guided Upsampling Module이 있는 다중 해상도 인코더가 Cityscapes에서 표준 이중선 보간(bilinear upsampling)보다 우수한가?
RQ3어떤 Guidance Module 설계가 정확도와 처리량의 균형을 가장 잘 맞추는가?
RQ4데이터 증강이 실시간 시맨틱 분할 성능에 미치는 영향은 무엇인가?

주요 결과

이름	샘플링 간격	mIoU (%)	FPS
SegNet	4	57.0	26.4
ENet	2	58.3	121.5
SQ	no	59.8	26.4
CRF-RNN	2	62.5	2.2
DeepLab	2	63.1	0.4
FCN-8S	no	65.3	4.9
Adelaide	no	66.4	0.05
Dilation10	no	67.1	0.4
ICNet	no	69.5	47.9
ERFNet	2	69.7	52.6
GUN (ours)	2	70.4	33.3
DeepLabv3+	no	81.2	n/a

GUN은 Titan Xp에서 Cityscapes 테스트 세트에서 33.3 FPS로 70.4% mIoU를 달성한다.
Guided Upsampling Module은 bilinear upsampling에 비해 경계 개선 효과를 나타낸다.
Fusion Guidance Module이 GUM 변형들 중 mIoU와 FPS의 최적 균형을 제공한다(69.64% mIoU, 33.3 FPS).
브랜치 간 가중치 공유를 갖는 다중 해상도 인코더가 비공유 변형보다 성능을 향상시킨다.
무작위 스케일링을 포함한 데이터 증강은 측정 가능한 mIoU 증가를 가져와 유익한 정규화를 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.