QUICK REVIEW

[論文レビュー] Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection

Nils Gählert, Nicolas Jourdan|TUbilio (Technical University of Darmstadt)|Jun 14, 2020

Advanced Neural Network Applications参考文献 16被引用数 33

ひとこと要約

Cityscapes 3D は Cityscapes を拡張し、9自由度を含むステレオベースの3D車両境界ボックス、モノキュラー3Dベンチマーク、および RGB のみの3D検出の距離認識指標を追加します。

ABSTRACT

Detecting vehicles and representing their position and orientation in the three dimensional space is a key technology for autonomous driving. Recently, methods for 3D vehicle detection solely based on monocular RGB images gained popularity. In order to facilitate this task as well as to compare and drive state-of-the-art methods, several new datasets and benchmarks have been published. Ground truth annotations of vehicles are usually obtained using lidar point clouds, which often induces errors due to imperfect calibration or synchronization between both sensors. To this end, we propose Cityscapes 3D, extending the original Cityscapes dataset with 3D bounding box annotations for all types of vehicles. In contrast to existing datasets, our 3D annotations were labeled using stereo RGB images only and capture all nine degrees of freedom. This leads to a pixel-accurate reprojection in the RGB image and a higher range of annotations compared to lidar-based approaches. In order to ease multitask learning, we provide a pairing of 2D instance segments with 3D bounding boxes. In addition, we complement the Cityscapes benchmark suite with 3D vehicle detection based on the new annotations as well as metrics presented in this work. Dataset and benchmark are available online.

研究の動機と目的

高品質な3D車両注釈を用いたモノキュラー RGB ベース検出のために Cityscapes を拡張する。
車両の完全な3D姿勢（yaw, pitch, roll）と9自由度情報を提供する。
2D インスタンスマスクと3Dボックスを対にしてマルチタスク学習を促進する。
距離認識評価指標を備えたモノキュラー3D検出ベンチマークを導入する。
既存の Cityscapes タスクとの一貫性と比較の容易さを確保する。

提案手法

ステレオRGB画像のみを用いて、すべての車両タイプの3D境界ボックスをアノテートする。
ステレオ点群とサイズプロトタイプを用いて初期の3Dボックスラベリングを安定させ、深度とサイズの曖昧さを低減する。
RGB画像の文脈で各車両の完全な3D方向（yaw, pitch, roll）を提供する。
各3Dボックスを対応する2Dインスタンスマスクとメタデータ（遮蔽、切り捨て、サイズプロトタイプ）とペアにする。
2D IoUベースのマッチングと新しい深度依存指標を備えた Cityscapes に合わせた評価プロトコルを採用する。
8つの車両クラスにわたって mean detection score (mDS) を報告するベンチマークスイートを提供する。

実験結果

リサーチクエスチョン

RQ1モノキュラー RGB ベースの手法は、ステレオ由来の注釈をグラウンドトゥルースとして用いることで9-DoFの3D車串境界ボックスを信頼性高く検出できるか？
RQ2 ego車両からの距離は、モノキュラー3D検出における3D位置、姿勢、およびサイズの精度にどのような影響を与えるか？
RQ32D インスタンスセグメントと3Dボックスを対にすることは、モノキュラー3D認識のマルチタスク学習を改善するか？
RQ4ステレオ由来の注釈を用いることは、画像空間の投影と3Dグラウンドトゥルースの整合性に対して、LiDAR ベースの注釈と比較してどう影響するか？
RQ5新しい深度認識指標は、距離レンジ全体でモノキュラー3D検出の性能をより良く評価するか？

主な発見

Cityscapes 3D はステレオ画像を用いて八つの車丣関連セマンティッククラスの3D車丩注釈を提供し、モノキュラー3Dベンチマークを可能にする。
Synscapes のグラウンドトゥルースと比較してアノテーション品質は、 tested 画像で yaw 誤差が 2.1 度未満、中心位置誤差が 1 メートル未満を達成。
データセットは多くのベースラインよりも1枚あたりの物体密度が高く、3Dモノキュラー検出の難しいシーンを際立たせている。
距離依存の評価により、提案された指標と距離ビン分割により距離に応じた性能変動が明らかになる。
ベンチマークは標準の 2D AP と深度依存の真陽性を組み合わせて Detection Score を生み出し、正確な3D位置決定と姿勢を重視する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。