QUICK REVIEW

[논문 리뷰] Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi- Supervised Semantic Segmentation

Yunchao Wei, Huaxin Xiao|arXiv (Cornell University)|2018. 05. 11.

Advanced Neural Network Applications참고 문헌 40인용 수 45

한 줄 요약

본 논문은 다중 확장(dilation) 합성곱 블록을 재사용하여 이미지 수준 라벨로부터 밀집한 객체 위치를 생성하고, PASCAL VOC 2012에서 약지도 및 준지도 학습의 최첨단 semantic segmentation을 가능하게 한다.

ABSTRACT

Despite the remarkable progress, weakly supervised segmentation approaches are still inferior to their fully supervised counterparts. We obverse the performance gap mainly comes from their limitation on learning to produce high-quality dense object localization maps from image-level supervision. To mitigate such a gap, we revisit the dilated convolution [1] and reveal how it can be utilized in a novel way to effectively overcome this critical limitation of weakly supervised segmentation approaches. Specifically, we find that varying dilation rates can effectively enlarge the receptive fields of convolutional kernels and more importantly transfer the surrounding discriminative information to non-discriminative object regions, promoting the emergence of these regions in the object localization maps. Then, we design a generic classification network equipped with convolutional blocks of different dilated rates. It can produce dense and reliable object localization maps and effectively benefit both weakly- and semi- supervised semantic segmentation. Despite the apparent simplicity, our proposed approach obtains superior performance over state-of-the-arts. In particular, it achieves 60.8% and 67.6% mIoU scores on Pascal VOC 2012 test set in weakly- (only image-level labels are available) and semi- (1,464 segmentation masks are available) supervised settings, which are the new state-of-the-arts.

연구 동기 및 목표

약지도 세그멘테이션을 위한 이미지 수준 감독 하에서의 밀집 객체 위치 지정의 격차를 동기 부여하고 해결한다.
다중 확장(dilation) 합성곱 블록을 이용해 구별 가능한 지식을 비구별 객체 영역으로 전달하는 간단하고 일반적인 방법을 제안한다.
밀집 로케라이제이션 맵이 약지도 및 준지도 설정 모두에서 세그멘테이션 학습을 개선하도록 한다.

제안 방법

다중 스케일에서 수용 영역을 확장하기 위해 여러 확산 비(dilation-rate) 블록으로 표준 분류 네트워크를 보강한다.
블록별로 클래스 활성화 맵(CAM)을 사용하여 객체 로컬라이제이션 맵을 생성한다.
확장(d=3,6,9)을 가진 블록의 로컬라이제이션 맵을 평균하여 잡음-방지 융합 전략을 제안하고, 그 결과를 d=1 맵에 더한다.
배경 신호로 주목도를 사용하여 밀집 로컬라이제이션 맵을 의사 마스크로 활용해 세그멘테이션 모델을 학습한다.
약지도(이미지 수준 라벨) 및 준지도(강한/약의 혼합) 설정에 대한 학습 목표를 제시한다.

실험 결과

연구 질문

RQ1다중 확장 비를 갖는 확장 합성곱 블록이 이미지 수준 감독으로부터 밀집하고 신뢰할 수 있는 객체 로컬라이제이션을 생성할 수 있는가?
RQ2다중 확장 로컬라이제이션 맵의 잡음 제거 융합이 약지도 및 준지도 설정에서 세그멘테이션 성능을 향상시키는가?
RQ3제안된 로컬라이제이션 방법이 약지도 및 준지도 체제에서 VOC 2012의 최첨단 결과에 어떤 영향을 미치는가?

주요 결과

약지도 설정에서 Pascal VOC 2012 테스트 세트에서 새로운 최첨단 mIoU를 달성: 60.8% (이미지 수준 라벨만).
준지도 설정에서 Pascal VOC 2012 테스트 세트의 새로운 최첨단 mIoU: 67.6%.
다중 확장 블록에 의해 생성된 밀집 로컬라이제이션 맵은 잡음 방지 전략과 융합될 때, 단일 확장이나 단순 평균에 비해 세그멘테이션 학습을 현저히 향상시킨다.
이 방법은 약지도 설정에서 검증 mIoU 60.4%, 테스트 mIoU 60.8%를 달성하고, 준지도 실험에서 검증 mIoU 65.7%, 테스트 mIoU 67.6%를 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.