QUICK REVIEW

[논문 리뷰] Real-Time Semantic Segmentation via Multiply Spatial Fusion Network

Haiyang Si, Zhiqiang Zhang|arXiv (Cornell University)|2019. 11. 17.

Advanced Neural Network Applications참고 문헌 19인용 수 45

한 줄 요약

MSFNet는 고해상도 이미지에서 빠르고 정확한 실시간 의미론적 분할을 가능하게 하는 Spatial Aware Pooling과 Class Boundary Supervision을 갖춘 다중 특징 융합 모듈을 도입합니다.

ABSTRACT

Real-time semantic segmentation plays a significant role in industry applications, such as autonomous driving, robotics and so on. It is a challenging task as both efficiency and performance need to be considered simultaneously. To address such a complex task, this paper proposes an efficient CNN called Multiply Spatial Fusion Network (MSFNet) to achieve fast and accurate perception. The proposed MSFNet uses Class Boundary Supervision to process the relevant boundary information based on our proposed Multi-features Fusion Module which can obtain spatial information and enlarge receptive field. Therefore, the final upsampling of the feature maps of 1/8 original image size can achieve impressive results while maintaining a high speed. Experiments on Cityscapes and Camvid datasets show an obvious advantage of the proposed approach compared with the existing approaches. Specifically, it achieves 77.1% Mean IOU on the Cityscapes test dataset with the speed of 41 FPS for a 1024*2048 input, and 75.4% Mean IOU with the speed of 91 FPS on the Camvid test dataset.

연구 동기 및 목표

속도와 정확도가 모두 중요한 고해상도 장면에서 실시간 의미론적 분할의 필요성을 동기로 삼는다.
무거운 계산 없이 수용 범위를 확대하는 backbone 친화적 효율적 아키텍처를 개발한다.
다중 스케일 특징을 융합하여 실시간 추론을 유지하면서 공간 정보를 보존한다.
에지 정보를 회복하기 위해 클래스 경계 보 supervising 메커니즘으로 가장자리 정보 손실을 완화한다.
Cityscapes와 CamVid 벤치마크에서 최첨단 실시간 성능을 입증한다.

제안 방법

backbone 블록 뒤에서 다중 스케일 특징을 추출하기 위해 Spatial Aware Pooling(SAP)을 도입한다.
동일 해상도의 출력을 융합하고 저비용으로 수용 체를 확장하기 위해 Multi-features Fusion Module(MFM)을 구축한다.
경계 정보를 회복하기 위해 두 개의 독립적인 업샘플링 분기를 갖춘 Class Boundary Supervision(CBS)을 제안한다.
입력 크기 1/8에서 최종 특징 맵을 업샘플링하여 속도를 유지하면서 디tails를 보존한다.
가볍고 간단한 인코더-디코더 파이프라인과 depthwise separable convolution을 사용하여 계산을 줄이고 lightweight ResNet-18 백본을 사용한다.
시맨틱 분할 손실과 경계 중심 손실을 결합한 이중 손실 objective로 학습한다.

실험 결과

연구 질문

RQ1실시간 환경에서 공간 정보를 보존하면서 수용 필드를 확장하기 위해 다중 스케일 특징 융합을 어떻게 설계할 수 있는가?
RQ2경계 기반 보 supervising가 속도를 저해하지 않으면서 에지 보존과 전체 분할 정확도를 향상시킬 수 있는가?
RQ3다양한 SAP 구성과 CBS 설계가 표준 데이터셋에서 mIoU와 FPS에 미치는 영향은 무엇인가?

주요 결과

Cityscapes에서 1024x2048 입력에 대해 CBS를 적용 시 41 FPS에서 77.1% mIoU를 달성.
CBS 없이 Cityscapes 성능은 47 FPS에서 75.4% mIoU에 도달; CBS를 사용할 때는 77.1% mIoU에 41 FPS.
512x1024 입력의 Cityscapes에서 117 FPS로 71.3% mIoU를 달성.
CamVid에서 512x1024 입력으로 91 FPS에서 75.4% mIoU, 1024x2048 입력으로 160 FPS에서 72.7%.
일부 변형 실험에서 MFM과 CBS가 실시간 구간에서 기저 인코더/디코더 대비 상당한 이득을 주었다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.