QUICK REVIEW

[论文解读] Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection

Nils Gählert, Nicolas Jourdan|TUbilio (Technical University of Darmstadt)|Jun 14, 2020

Advanced Neural Network Applications参考文献 16被引用 33

一句话总结

Cityscapes 3D 在 Cityscapes 的基础上扩展了基于立体视觉的 3D 车辆边界框，覆盖九个自由度，并新增单目 3D 基准测试和用于仅 RGB 的 3D 检测的距离感知度量。

ABSTRACT

Detecting vehicles and representing their position and orientation in the three dimensional space is a key technology for autonomous driving. Recently, methods for 3D vehicle detection solely based on monocular RGB images gained popularity. In order to facilitate this task as well as to compare and drive state-of-the-art methods, several new datasets and benchmarks have been published. Ground truth annotations of vehicles are usually obtained using lidar point clouds, which often induces errors due to imperfect calibration or synchronization between both sensors. To this end, we propose Cityscapes 3D, extending the original Cityscapes dataset with 3D bounding box annotations for all types of vehicles. In contrast to existing datasets, our 3D annotations were labeled using stereo RGB images only and capture all nine degrees of freedom. This leads to a pixel-accurate reprojection in the RGB image and a higher range of annotations compared to lidar-based approaches. In order to ease multitask learning, we provide a pairing of 2D instance segments with 3D bounding boxes. In addition, we complement the Cityscapes benchmark suite with 3D vehicle detection based on the new annotations as well as metrics presented in this work. Dataset and benchmark are available online.

研究动机与目标

用高质量的 3D 车辆注释扩展 Cityscapes，以支持基于单目 RGB 的检测。
为车辆提供完整的 3D 姿态（偏航 yaw、俯仰 pitch、横滚 roll）和九自由度信息。
将 2D 实例掩模与 3D 框配对，以促进多任务学习。
引入一个带距离感知评估度量的单目 3D 检测基准。
确保注释的一致性，并便于与现有 Cityscapes 任务进行对比。

提出的方法

仅使用立体 RGB 成像为所有车辆类型标注 3D 边界框。
使用立体点云和尺寸原型来稳定初始 3D 框标注，降低深度-尺寸的歧义。
在 RGB 图像上下文中为每辆车提供完整的 3D 取向（yaw、pitch、roll）。
将每个 3D 框与相应的 2D 实例掩模及元数据（遮挡、截断、尺寸原型）配对。
采用与 Cityscapes 对齐的评估协议，基于 2D IoU 的匹配并引入新的深度相关度量。
提供一个基准套件，在 8 个车辆类别上报告 mean detection score (mDS)。

实验结果

研究问题

RQ1是否能够使用立体派生注释作为地面真值，靠单目 RGB 方法可靠地检测具九自由度的 3D 车辆边界框？
RQ2在单目 3D 检测中，距离自车的距离如何影响 3D 定位、取向和尺寸的准确性？
RQ3成对的 2D 实例分割和 3D 框是否提升单目 3D 感知的多任务学习？
RQ4与基于激光雷达的注释相比，使用立体派生注释对图像空间投影与 3D 地面真值之间对齐有何影响？
RQ5新的深度感知度量如何在不同距离区间更好地评估单目 3D 检测性能？

主要发现

Cityscapes 3D 使用立体成像为八个与车辆相关的语义类提供 3D 车辆注释，支持单目 3D 基准测试。
注释质量经 Synscapes 地面真值验证，在测试图像上偏航误差低于 2.1 度，中心位置误差低于 1 米。
该数据集在每张图像中的对象密度高于多数基线，凸显了用于 3D 单目检测的挑战性场景。
深度相关评估揭示了随距离变化的性能差异，由所提出的度量和距离分箱所实现。
该基准将标准 2D AP 与深度相关的真正例结合起来，产生一个 Detection Score，偏向于准确的 3D 定位与取向。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。