QUICK REVIEW

[論文レビュー] RangeRCNN: Towards Fast and Accurate 3D Object Detection with Range Image Representation

Zhidong Liang, Ming Zhang|arXiv (Cornell University)|Sep 1, 2020

Robotics and Sensor-Based Localization参考文献 38被引用数 57

ひとこと要約

RangeRCNN は range image ベースの 2D CNN バックボーンと RV-PV-BEV 特徴転送、そして 3D 物体検出のための 2 段階 RCNN を導入し、 KITTI と Waymo で最先端の結果を達成すると同時にリアルタイム性能を実現します。

ABSTRACT

We present RangeRCNN, a novel and effective 3D object detection framework based on the range image representation. Most existing methods are voxel-based or point-based. Though several optimizations have been introduced to ease the sparsity issue and speed up the running time, the two representations are still computationally inefficient. Compared to them, the range image representation is dense and compact which can exploit powerful 2D convolution. Even so, the range image is not preferred in 3D object detection due to scale variation and occlusion. In this paper, we utilize the dilated residual block (DRB) to better adapt different object scales and obtain a more flexible receptive field. Considering scale variation and occlusion, we propose the RV-PV-BEV (range view-point view-bird's eye view) module to transfer features from RV to BEV. The anchor is defined in BEV which avoids scale variation and occlusion. Neither RV nor BEV can provide enough information for height estimation; therefore, we propose a two-stage RCNN for better 3D detection performance. The aforementioned point view not only serves as a bridge from RV to BEV but also provides pointwise features for RCNN. Experiments show that RangeRCNN achieves state-of-the-art performance on the KITTI dataset and the Waymo Open dataset, and provides more possibilities for real-time 3D object detection. We further introduce and discuss the data augmentation strategy for the range image based method, which will be very valuable for future research on range image.

研究の動機と目的

3D 検出のためのボクセル/ポイントベース手法の密でロスレスな代替として range image 表現を動機づける。
スケール変化に対応する柔軟な受容野を持つ range image バックボーンを開発する。
range view の特徴を bird’s eye view へ橋渡ししてアンカー生成と効率を実現する。
高度な高さ推定と3D位置合わせを改善するために、2段階 RCNN で3D バウンディングボックスを精練する。
KITTI および Waymo データセットで最先端の性能とリアルタイム能力を示す。

提案手法

range image 上で dilated 残差ブロックを用いた 2D エンコーダ–デコーダバックボーンを用い、マルチスケール特徴を捉える。
DRB を導入し、3つの dilated 3×3 畳み込み（レート 1, 2, 3）を結合して 1×1 融合と連結を組み合わせることで柔軟な受容野を得る。
range view から BEV へ特徴を転送する RV-PV-BEV モジュールを実装し、BEV ベースのアンカー生成を可能にしつつ高レベルの range features を保持する。
RPN を用いて BEV から 3D 提案を生成し、3D RoI pooling で 3D グリッドをベクトル化して全結合層を通すことで精練する。
エンドツーエンドの 2 段階 RCNN 損失 (L_total = L_rpn + L_rcnn) を採用し、 focal 分類、 smooth-L1 回帰、方向分類、スコア、 refining、コーナー損失を含む。
KITTI および Waymo で、データ拡張 (flip, scale, rotation, Waymo の ground-truth pasting) と cosine-annealing 学習率で訓練する。

実験結果

リサーチクエスチョン

RQ1range image を 2D CNN による高速な 3D 物体検出のためのロスレスで密な特徴源として使用できるか？
RQ2range view から BEV へ特徴を効果的に転送して信頼性の高いアンカー生成を実現するにはどうすればよいか？
RQ33D RoI pooling を備えた 2 段階 RCNN は、1 段 range-image 検出器よりも高度推定と 3D ローカライゼーションを改善しますか？
RQ4voxel/point ベース手法と比較して RangeRCNN の KITTI および Waymo での性能と効率のトレードオフはどうか？

主な発見

RangeRCNN は KITTI および Waymo ベンチマークで最先端の性能を達成し、多くの従来手法を上回っている。
RangeRCNN は 22 FPS で動作し、リアルタイム性を提供する。
KITTI では BEV でほとんどの手法を上回り、3D ではトップに近づき、RCNN 的 refinements から顕著な 3D 向上を示す。
Waymo Level 1 の結果は RangeRCNN が従来手法を上回り、特に中〜長距離(30–75 m)で優れる。
アブレーション研究は 3D RCNN プーリングが 3D 検出に有用であり、プーリンググリッドサイズへの頑健性を示す。
range-image 駆動特徴を用いる RangeRCNN は、物体が疎になったり遠くなるほど強い性能優位を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。