QUICK REVIEW

[论文解读] Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Yan Wang, Wei‐Lun Chao|arXiv (Cornell University)|Dec 18, 2018

Advanced Neural Network Applications参考文献 36被引用 68

一句话总结

该论文表明将基于图像的深度转换为伪LiDAR 3D点云并对基于LiDAR的检测器进行应用，显著提升立体/单目3D目标检测，将与真实LiDAR的差距缩小。它认为表示（representation），而非深度精度，是主要瓶颈。

ABSTRACT

3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies --- a gap that is commonly attributed to poor image-based depth estimation. However, in this paper we argue that it is not the quality of the data but its representation that accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking the LiDAR signal. With this representation we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state-of-the-art in image-based performance --- raising the detection accuracy of objects within the 30m range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo-image-based approaches. Our code is publicly available at https://github.com/mileyan/pseudo_lidar.

研究动机与目标

推动用类似LiDAR的3D点表示（伪LiDAR）替代图像深度图，以实现3D目标检测。
研究伪LiDAR表示是否能提高KITTI上立体/单目3D检测的准确性。
证明伪LiDAR与现有基于LiDAR的检测器在不同架构上的兼容性。
量化数据表示对立体到LiDAR性能差距的影响。

提出的方法

将来自立体或单目输入的密集深度图反投影为3D点，形成伪LiDAR点云。
将现有基于LiDAR的3D检测器（例如Frustum PointNet、AVOD）应用于伪-LiDAR数据。
在同一检测管线内，通过对比伪-LiDAR与前视深度表示来比较表示策略。
在KITTI数据集上，按IoU=0.5和0.7对汽车、行人、骑行者类别评估3D/BEV AP。

实验结果

研究问题

RQ1伪LiDAR表示是否能提高KITTI上立体/单目深度估计的3D目标检测精度？
RQ2在与基于LiDAR的检测器一起使用时，伪-LiDAR与前视深度表示相比如何？
RQ3深度估计方法（立体 vs 单目）对伪-LiDAR检测性能有何影响？
RQ4图像基深度检测在多大程度上可以接近LiDAR基3D检测性能，仍存在哪些差距？
RQ5在对象类别（汽车、行人、自行车手）和难度等级上，改进是否一致？

主要发现

伪LiDAR显著提升基于立体的3D检测，在KITTI上相对于基于图像的方法取得了显著提升。
在IoU 0.7（中等）下，采用伪-LiDAR的立体达到45.3%的AP_BEV/3D，远超先前基于图像的SOTA。
两个基于LiDAR的检测器（Frustum PointNet和AVOD）均从伪LiDAR中受益，表明与现有3D检测架构具有广泛兼容性。
增益在很大程度上归因于数据表示，而非深度估计质量，前视深度表示与伪LiDAR相比表现差。
基于立体的伪-LiDAR方法缩小了与LiDAR的差距，显示出有竞争力的性能并暗示了成本效益的自动驾驶感知。
行人/自行车手的结果仍存在差距，但为这些类别的图像基3D检测奠定了起点。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。