QUICK REVIEW

[논문 리뷰] Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

Xiaokang Chen, Kwan-Yee Lin|arXiv (Cornell University)|2020. 07. 17.

Advanced Neural Network Applications참고 문헌 55인용 수 51

한 줄 요약

본 논문은 Separation-and-Aggregation(SA) 게이트와 Bi-directional Multi-step Propagation(BMP)을 갖춘 양방향 교차 모달 인코더를 도입하여 RGB-D 시맨틱 세분화를 위해 RGB와 노이즈가 있는 깊이(HHA) 신호를 강인하게 융합하고, NYU Depth V2 및 CityScapes에서 기존 백본에 연결되었을 때 최첨단 성능을 달성합니다.

ABSTRACT

Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion to obtain better feature representations to achieve more accurate segmentation. This, however, may not lead to satisfactory results as actual depth data are generally noisy, which might worsen the accuracy as the networks go deeper. In this paper, we propose a unified and efficient Cross-modality Guided Encoder to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively. The key of the proposed architecture is a novel Separation-and-Aggregation Gating operation that jointly filters and recalibrates both representations before cross-modality aggregation. Meanwhile, a Bi-direction Multi-step Propagation strategy is introduced, on the one hand, to help to propagate and fuse information between the two modalities, and on the other hand, to preserve their specificity along the long-term propagation process. Besides, our proposed encoder can be easily injected into the previous encoder-decoder structures to boost their performance on RGB-D semantic segmentation. Our model outperforms state-of-the-arts consistently on both in-door and out-door challenging datasets. Code of this work is available at https://charlescxk.github.io/

연구 동기 및 목표

야생 설정에서 노이즈 및 정렬되지 않은 깊이 데이터에서 강인한 RGB-D 융합의 필요성 제시
융합 전 각 모달리티를 재보정하는 교차 모달리티 가이던 인코더 개발
깊이 노이즈를 필터링하고 모달리티를 적응적으로 융합하는 Separation-and-Aggregation Gate(SA-Gate) 도입
인코딩 과정에서 모달리티 특이성 보존을 위한 Bi-direction Multi-step Propagation(BMP) 도입
성능 향상을 위한 기존 RGB 세그멘테이션 디코더와의 플러그 앤 플레이 호환성 입증

제안 방법

SA-Gate는 노이즈가 있는 깊이 특징을 교차 모달 주의를 이용해 필터링하는 Feature Separation(FS)와 RGB와 깊이를 공간 게이트로 융합하는 Feature Aggregation(FA)로 구성됩니다.
FS는 연결된 RGB와 깊이에 대한 글로벌 풀링을 사용해 교차 모달 주의 벡터를 생성한 다음 채널별 스케일링으로 깊이를 필터링하고 RGB_in에 대해 RGB_rec = HHA_filtered + RGB_in으로 재보정합니다.
FA는 재보정된 RGB와 HHA로부터 공간 게이트를 생성하여 RGB_in과 HHA_in의 가중 융합 M을 A_rgb 및 A_hha를 소프트맥스 정규화된 공간 가중치로 만듭니다.
최종 잔차 유사 융합으로 RGB_out과 HHA_out가 인코더에서 순방향으로 피드 포워드됩니다(양방향 전파).
BMP는 융합 특징을 여러 층에서 전파하여 모달리티 특이성을 유지하면서 인코더 전체의 표현을 다듬습니다.

실험 결과

연구 질문

RQ1깊이 노이즈 하에서 특징을 명시적으로 분리한 뒤 다시 합치는 교차 모달 게이트가 RGB-D 시맨틱 세분화를 개선할 수 있을까?
RQ2양방향 특징 전파가 모달리티 특이 정보를 보존하면서 효과적인 교차 모달 융합을 가능하게 할까?
RQ3제안된 인코더가 기존 RGB 기반 세그멘테이션 백본에 얼마나 잘 플러그인되어 실내 및 실외 데이터셋에서 성능을 높일 수 있을까?
RQ4SA-Gate와 BMP가 RGB-D 기반 기준선 및 기존 RGB-D 방법에 비해 정확도와 효율성에 어떤 영향을 미치는가?

주요 결과

NYU Depth V2에서 제안된 방법은 mIoU 52.4와 Pixel Acc 77.9를 달성하며 RGB-D 기준선(46.7 mIoU)을 능가합니다.
이 접근 방식은 디코더 전반에서 상당한 개선을 제공하며 플러그 앤 플레이 가능성을 입증합니다.
CityScapes 실험은 깊이가 노이즈일 때도 강력한 증가를 보이며 검증에서 최첨단 성능을 달성하고 테스트에서도 경쟁력을 가지며 RGB 기반 기준선 대비 큰 개선을 보입니다.
SA-Gate + BMP가 각각의 구성요소만 사용할 때보다 더 큰 이점을 제공하며 교차 모달 특징 전파에서 상호 보완적 역할을 보여줍니다.
모델은 RGB-D 기준선에 비해 메모리와 계산을 줄이면서도 정확도는 더 높며(예: 표 1은 RGB-D 기준선보다 FLOPs가 낮고 mIoU가 더 좋음),
정성적 시각화에서 SA-Gate가 모달리티 특이적 초점을 학습하는 것을 보여주며(세부사항은 RGB에 초점을, HHA는 조명에 안정적인 영역에 초점) 경계 및 질감 처리 개선에 기여합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.