QUICK REVIEW

[논문 리뷰] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen, Yukun Zhu|arXiv (Cornell University)|2018. 02. 07.

Advanced Neural Network Applications참고 문헌 65인용 수 4,542

한 줄 요약

이 논문은 DeepLabv3를 인코더-디코더 아키텍처(DeepLabv3+)로 확장하고 경량 디코더를 추가하며 atrous separable convolutions를 사용하여 포스트 프로세싱 없이 PASCAL VOC 2012와 Cityscapes에서 최첨단 시맨틱 분할 성능을 달성한다.

ABSTRACT

Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89.0\% and 82.1\% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at \url{https://github.com/tensorflow/models/tree/master/research/deeplab}.

연구 동기 및 목표

공간 피라미드 풀링과 인코더-디코더 구조의 강점을 시맨틱 분할에 결합한다.
atrous convolution을 통해 인코더 특징 해상도를 제어 가능하게 하여 정확도와 속도 사이의 균형을 조정한다.
인코더 특징을 재사용하면서 경계를 다듬는 디코더를 도입한다.
speed와 accuracy를 향상시키기 위해 depthwise separable convolutions(Xception 기반)을 채택한다.

제안 방법

간단하면서도 효과적인 디코더를 추가하여 경계를 다듬도록 DeepLabv3를 확장한다.
인코더에서 atrous (dilated) convolution을 적용하여 특징 밀도와 수용 영역을 제어한다.
ASPP와 디코더 모듈 모두에 depthwise separable convolutions(atrous separable convolution)을 포함한다.
계산량을 줄이기 위해 depthwise separable convolutions를 갖춘 정렬된 Xception 백본으로 적합화한다.
정확도와 속도 균형을 위해 VOC 2012에서 end-to-end로 학습하고 output stride를 16x/8x로 설정한다.
DeepLab 저장소에 공개 TensorFlow 구현이 제공된다.

실험 결과

연구 질문

RQ1ASPP와 간단한 디코더를 활용하는 인코더-디코더 구조가 포스트-프로세싱 없이 경계 선명도를 향상시킬 수 있는가?
RQ2atrous separable convolution과 Xception 기반 백본의 사용이 시맨틱 분할에서 정확도와 속도에 미치는 영향은?
RQ3제안된 디코더 설계가 표준 벤치마크에서 경계 정밀도와 전체 mIoU에 어떤 영향을 미치는가?

주요 결과

제안된 디코더를 갖춘 DeepLabv3+가 PASCAL VOC 2012 테스트 세트에서 89.0% mIoU를 달성했다 (JFT 사전학습을 포함한 VOC 2012 테스트 결과).
Cityscapes에서 DeepLabv3+는 테스트 세트에서 post-processing 없이 82.1% mIoU에 도달하고 백본 및 설정에 따라 검증에서 79.55–82.1% 범위를 보인다.
백본으로 Xception을 사용하고 atrous separable convolution을 더하면 곱-더하기(Multiply-Adds)가 33–41% 감소하나 유사한 mIoU를 보인다.
디코더 설계 선택은 순진한 양선형 보간에 비해 향상을 보이며, 특히 객체 경계 근처에서(트림맵 분석에서 상당한 이득).
COCO/JFT 사전 학습과 함께, 모델은 VOC 2012 테스트에서 89.0%, Cityscapes에서 82.1%를 미세 조정 후 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.