QUICK REVIEW

[논문 리뷰] RangeRCNN: Towards Fast and Accurate 3D Object Detection with Range Image Representation

Zhidong Liang, Ming Zhang|arXiv (Cornell University)|2020. 09. 01.

Robotics and Sensor-Based Localization참고 문헌 38인용 수 57

한 줄 요약

RangeRCNN은 RV-PV-BEV 특징 전달과 2단 RCNN으로 3D 물체 탐지를 위한 range 이미지 기반 2D CNN 백본을 도입하여 KITTI와 Waymo에서 최첨단 성능을 달성하고 실시간 성능을 가능하게 한다.

ABSTRACT

We present RangeRCNN, a novel and effective 3D object detection framework based on the range image representation. Most existing methods are voxel-based or point-based. Though several optimizations have been introduced to ease the sparsity issue and speed up the running time, the two representations are still computationally inefficient. Compared to them, the range image representation is dense and compact which can exploit powerful 2D convolution. Even so, the range image is not preferred in 3D object detection due to scale variation and occlusion. In this paper, we utilize the dilated residual block (DRB) to better adapt different object scales and obtain a more flexible receptive field. Considering scale variation and occlusion, we propose the RV-PV-BEV (range view-point view-bird's eye view) module to transfer features from RV to BEV. The anchor is defined in BEV which avoids scale variation and occlusion. Neither RV nor BEV can provide enough information for height estimation; therefore, we propose a two-stage RCNN for better 3D detection performance. The aforementioned point view not only serves as a bridge from RV to BEV but also provides pointwise features for RCNN. Experiments show that RangeRCNN achieves state-of-the-art performance on the KITTI dataset and the Waymo Open dataset, and provides more possibilities for real-time 3D object detection. We further introduce and discuss the data augmentation strategy for the range image based method, which will be very valuable for future research on range image.

연구 동기 및 목표

Range 이미지 표현을 3D 탐지를 위한 voxel/point 기반 방법의 밀집하고 손실 없는 대안으로 제시한다.
유연한 수용필드를 갖춘 range 이미지 백본을 개발하여 스케일 변화에 대응한다.
anchor 생성을 위해 range view 특징을 BEV로 연결한다.
높이 추정 및 3D 위치 추정을 개선하기 위해 2단 RCNN으로 3D 바운딩 박스를 정제한다.
KITTI와 Waymo 데이터셋에서 최첨단 성능과 실시간 가능성을 입증한다.

제안 방법

range 이미지에서 dilated residual blocks를 사용하여 다중 스케일 특징을 포착하는 2D 인코더–디코더 백본을 사용한다.
3개의 dilated 3×3 컨볼루션(율 1, 2, 3)을 연결 및 1×1 융합으로 구성된 DRB를 도입하여 유연한 수용필드를 얻는다.
range view에서 BEV로 특징을 전달하는 RV-PV-BEV 모듈을 구현하여 BEV 기반 앵커 생성을 가능하게 하면서도 고수준의 range 특징을 보존한다.
BEV에서 지역 제안 네트워크(RPN)를 사용해 3D 제안을 생성하고 3D RoI 풀링으로 벡터화된 3D 격자를 완전히 연결된 층으로 정제한다.
Focal 분류, smooth-L1 회귀, 방향 분류, 점수, 정제, 모서리 손실을 포함하는 L_total = L_rpn + L_rcnn인 엔드-투-엔드 2단 RCNN 손실을 채택한다.
KITTI와 Waymo에서 데이터 증강(뒤집기, 스케일, 회전, Waymo의 경우 실제 정답 붙여넣기)과 코사인 감소 학습률을 사용하여 학습한다.

실험 결과

연구 질문

RQ1range 이미지를 2D CNN을 통한 빠른 3D 물체 탐지를 위한 손실 없이 밀집한 특징 소스로 사용할 수 있는가?
RQ2range view의 특징을 BEV로 효과적으로 전달하여 안정적인 앵커 생성을 할 수 있는가?
RQ33D RoI 풀링이 있는 2단 RCNN이 1단 범위 이미지 탐지기에 비해 높이 추정과 3D 위치 추정에서 성능을 개선하는가?
RQ4범위-이미지 기반 방법과 비교했을 때 KITTI와 Waymo에서 RangeRCNN의 성능 및 효율성 트레이드오프는 어떤가?

주요 결과

RangeRCNN은 KITTI 및 Waymo 벤치마크에서 최첨단 성능을 달성하여 많은 기존 방법들을 능가한다.
RangeRCNN은 22 FPS로 실행되어 실시간 능력을 제공한다.
KITTI에서 RangeRCNN은 BEV에서 대부분의 방법을 능가하고 3D에서도 상위권에 근접하며 RCNN 정제로 인한 3D 이점이 두드러진다.
Waymo Level 1 결과는 RangeRCNN이 이전 방법들을 능가하는데 특히 중거리~장거리(30–75 m)에서 두드러진다.
Ablation 연구는 3D RCNN 풀링이 3D 탐지에 가치가 있음을 보여주고 풀링 그리드 크기에 대한 견고성을 보여준다.
RangeRCNN은 range-image 기반 특징으로 객체가 더 희박해지거나 더 멀어질수록 강한 성능 우위를 보여준다."

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.