QUICK REVIEW

[論文レビュー] Complex-YOLO: Real-time 3D Object Detection on Point Clouds

Martín Simón, Stefan Milz|arXiv (Cornell University)|Mar 16, 2018

Advanced Neural Network Applications参考文献 24被引用数 78

ひとこと要約

Complex-YOLO は Euler-Region-Proposal ネットワークを導入し、リアルタイムで LiDAR ポイントクラウドから直接 3D 指向ボックスを推定します。高い効率とカメラ入力なしでのマルチクラス検出を実現します。

ABSTRACT

Lidar based 3D object detection is inevitable for autonomous driving, because it directly links to environmental understanding and therefore builds the base for prediction and motion planning. The capacity of inferencing highly sparse 3D data in real-time is an ill-posed problem for lots of other application areas besides automated vehicles, e.g. augmented reality, personal robotics or industrial automation. We introduce Complex-YOLO, a state of the art real-time 3D object detection network on point clouds only. In this work, we describe a network that expands YOLOv2, a fast 2D standard object detector for RGB images, by a specific complex regression strategy to estimate multi-class 3D boxes in Cartesian space. Thus, we propose a specific Euler-Region-Proposal Network (E-RPN) to estimate the pose of the object by adding an imaginary and a real fraction to the regression network. This ends up in a closed complex space and avoids singularities, which occur by single angle estimations. The E-RPN supports to generalize well during training. Our experiments on the KITTI benchmark suite show that we outperform current leading methods for 3D object detection specifically in terms of efficiency. We achieve state of the art results for cars, pedestrians and cyclists by being more than five times faster than the fastest competitor. Further, our model is capable of estimating all eight KITTI-classes, including Vans, Trucks or sitting pedestrians simultaneously with high accuracy.

研究の動機と目的

自動運転のために LiDAR データのみを用いたリアルタイム 3D 物体検出を動機づける。
鳥瞰図 LiDAR マップからデCartesian空間で 3D バウンディングボックスを構築する高速なエンドツーエンドのネットワークを開発する。
角度特異点を避けて、物体の向きを頑健に推定する Euler 回帰アプローチ（E-RPN）を導入する。
KITTI において複数クラスで競争力のある精度を維持しつつ、最先端の効率を達成する。）

提案手法

LiDAR ポイントクラウドを単一の birds-eye-view RGB-map（高さ、強度、密度）へ前処理し、80m x 40m のカバレッジ。
BEV マップ上での単一パス予測のため、YOLOv2風のCNN アーキテクチャを適応する。
Euler-Region-Proposal (E-RPN) を導入し、3D ボックスのパラメータ（x, y, w, l）と向きを、複素数ベースの角度回帰（b_phi = arctan2(t_im, t_re)）を用いて回帰する。
KITTI の物体形状をカバーするために 3 つのアンカーサイズと 2 つの向き方向を使用し、グリッドセルあたり 5 個のボックスと関連スコアを予測する。
YOLO風のロスと新規の Euler 回帰ロスを組み合わせ、特異点のない複素空間で角度予測を最適化する。

実験結果

リサーチクエスチョン

RQ1実時間の LiDAR だけのモデルは、複数の KITTI クラスに対して正確な 3D 指向ボックスを生成できるか？
RQ2複素空間（Euler 回帰）に角度回帰を埋め込むことで、向きの頑健性と一般化は向上するか？
RQ3単一の BEV マップと1回のフォワードパスを使用した場合の検出速度と精度のトレードオフは何か？
RQ4カメラ入力なしで、単一のネットワークが同時に複数クラスを予測し、リアルタイム性能を維持できるか？
RQ5提案手法は BEV および 3D 検出タスクの両方で KITTI ベンチマークでどのように性能を示すか？

主な発見

Titan X でのリアルタイム性能 (>50 fps) を達成し、KITTI BEV 検出で競争力のある精度を維持。
BEV 検出で少なくとも 5 倍、いくつかの比較では 10 倍以上の効率で、先行の LiDAR ベース手法を上回る。
複素角回帰（Euler 回帰）で向きをエンコードし、角度特異点を避けて一般化を改善。
LiDAR 入力のみから KITTI の 8 クラスを予測（バン、トラック、座っている歩行者を含む）、カメラデータなし。
1 回のフォワードパスで全てのバウンディングボックスを処理する単一のエンドツーエンドネットワークを提供し、組み込みプラットフォーム（例：TX2）へのデプロイを可能にする。
CAR、PEDESTRIAN、CYCLIST カテゴリで強力な BEV および 3D 検出性能を、競合的な AP 値とともに示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。