QUICK REVIEW

[논문 리뷰] RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation

Jindong Jiang, Lunan Zheng|arXiv (Cornell University)|2018. 06. 04.

Remote Sensing and LiDAR Applications참고 문헌 37인용 수 181

한 줄 요약

RedNet는 RGB-D 융합 및 피라미드 감독을 갖춘 잔차 인코더-디코더를 도입하여 실내 의미론적 분할을 수행하며, ResNet-50 백본으로 SUN RGB-D에서 47.8% mIoU를 달성합니다.

ABSTRACT

Indoor semantic segmentation has always been a difficult task in computer vision. In this paper, we propose an RGB-D residual encoder-decoder architecture, named RedNet, for indoor RGB-D semantic segmentation. In RedNet, the residual module is applied to both the encoder and decoder as the basic building block, and the skip-connection is used to bypass the spatial feature between the encoder and decoder. In order to incorporate the depth information of the scene, a fusion structure is constructed, which makes inference on RGB image and depth image separately, and fuses their features over several layers. In order to efficiently optimize the network's parameters, we propose a `pyramid supervision' training scheme, which applies supervised learning over different layers in the decoder, to cope with the problem of gradients vanishing. Experiment results show that the proposed RedNet(ResNet-50) achieves a state-of-the-art mIoU accuracy of 47.8% on the SUN RGB-D benchmark dataset.

연구 동기 및 목표

실내 RGB-D 의미론 분할을 심층 인코더-디코더 아키텍처로 개선합니다.
깊이 정보를 이중 분기 RGB-D 융합 전략을 통해 통합합니다.
피라미드 감독을 디코더 계층 전체에 걸쳐 적용하여 기울기 소실를 완화합니다.
인코더와 디코더에 잔차 블록을 사용하여 더 깊은 네트워크를 가능하게 합니다.
SUN RGB-D에서 RedNet을 평가하여 성능을 벤치마크합니다.

제안 방법

잔차 블록이 있는 이중 분기 RGB 및 Depth 인코더(ResNet-50 또는 ResNet-34)를 사용합니다.
다중 계층에서 요소별 합산을 통해 RGB 분기에 깊이 특징을 융합합니다.
전체 해상도를 회복하기 위해 디코더에 업샘플링 잔차 모듈을 구현합니다.
여러 디코더 계층의 측면 출력을 해당 손실과 함께 추가하여 피라미드 감독을 적용합니다.
중앙값 빈도 균형 및 ImageNet-사전학습 인코더를 사용한 가중 교차 엔트로피로 학습합니다.
ResNet-50 사용 시 메모리 절감을 위해 에이전트 레이어를 선택적으로 사용합니다.

실험 결과

연구 질문

RQ1RGB-D 융합 잔차 인코더-디코더가 기존의 실내 RGB-D 분할 모델을 능가할 수 있는가?
RQ2여러 인코더 계층에서의 깊이 융합이 분할 정확도를 향상시키는가?
RQ3피라미드 감독이 최적화 및 최종 성능을 개선하는가?

주요 결과

모델	픽셀	평균	mIoU
RedNet(ResNet-34) without pyramid	80.3	55.5	45.0
RedNet(ResNet-34)	80.8	58.3	46.8
RedNet(ResNet-50) without pyramid	80.5	57.4	46.0
RedNet(ResNet-50)	81.3	60.3	47.8

RedNet-34는 ResNet-50과 함께 피라미드 감독을 사용할 때 SUN RGB-D에서 46.8 mIoU, 81.3 픽셀 정확도, 60.3 평균 정확도를 달성합니다.
RedNet-50은 피라미드 감독과 함께 SUN RGB-D에서 47.8 mIoU, 81.3 픽셀 정확도, 60.3 평균 정확도를 달성합니다.
피라미드 감독이 없는 경우 RedNet-34는 45.0 mIoU를, RedNet-50은 46.0 mIoU를 달성합니다.
피라미드 감독이 있는 RedNet-50은 비피라미드 버전보다 약 1.8 mIoU(47.8 대 46.0) 더 높은 성능을 보입니다.
전반적으로 RedNet 변형은 SUN RGB-D의 여러 기존 RGB-D 의미론 분할 방법보다 우수합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.