QUICK REVIEW

[论文解读] Fully Sparse 3D Object Detection

Lue Fan, Feng Wang|arXiv (Cornell University)|Jul 20, 2022

Advanced Neural Network Applications被引用 41

一句话总结

本文提出 Fully Sparse Detector (FSD)，它使用 Sparse Instance Recognition (SIR) 来高效执行长距离激光雷达三维目标检测，在 Waymo Open Dataset 上取得了最先进的结果，并在 Argoverse 2 上展示了强劲的长距离性能，点数复杂度线性。

ABSTRACT

As the perception range of LiDAR increases, LiDAR-based 3D object detection becomes a dominant task in the long-range perception task of autonomous driving. The mainstream 3D object detectors usually build dense feature maps in the network backbone and prediction head. However, the computational and spatial costs on the dense feature map are quadratic to the perception range, which makes them hardly scale up to the long-range setting. To enable efficient long-range LiDAR-based object detection, we build a fully sparse 3D object detector (FSD). The computational and spatial cost of FSD is roughly linear to the number of points and independent of the perception range. FSD is built upon the general sparse voxel encoder and a novel sparse instance recognition (SIR) module. SIR first groups the points into instances and then applies instance-wise feature extraction and prediction. In this way, SIR resolves the issue of center feature missing, which hinders the design of the fully sparse architecture for all center-based or anchor-based detectors. Moreover, SIR avoids the time-consuming neighbor queries in previous point-based methods by grouping points into instances. We conduct extensive experiments on the large-scale Waymo Open Dataset to reveal the working mechanism of FSD, and state-of-the-art performance is reported. To demonstrate the superiority of FSD in long-range detection, we also conduct experiments on Argoverse 2 Dataset, which has a much larger perception range ($200m$) than Waymo Open Dataset ($75m$). On such a large perception range, FSD achieves state-of-the-art performance and is 2.4$ imes$ faster than the dense counterpart. Codes will be released at https://github.com/TuSimple/SST.

研究动机与目标

通过移除密集特征图来推动高效的长距离基于 LiDAR 的 3D 目标检测，解决 Center Feature Missing (CFM)。
开发一个完全稀疏的探测器，仅处理非空体素与实例组，以实现相对于点和距离的接近线性成本。
提出 Sparse Instance Recognition (SIR)，从分组点中提取实例特征并预测边界框。
证明所提出的方法在 Waymo 上可与密集探测器相媲美甚至超越，在 Argoverse 2 的 200m 范围场景中表现出色。
表明 SIR 能实现高效、准确的长距离检测，而无需大量下采样或近邻查询。

提出的方法

使用稀疏体素编码器提取体素特征并执行中心投票，类似于 VoteNet。
通过 Connected Components Labeling (CCL) 将投票中心分组为实例，形成不相交的实例组。
应用 Sparse Instance Recognition (SIR) 以动态广播/池化提取实例特征，为每个组产生一个边界框预测。
可选地使用第二个 SIR (SIR2) 对提议进行细化，回归框残差并使用基于 IoU 的软标签进行分类。
使用包含语义分类、投票、三维回归和基于 IoU 的监督的损失组合进行训练。

实验结果

研究问题

RQ1在没有密集 BEV 特征图的情况下，完全稀疏的 3D 探测器是否能在长距离 LiDAR 数据上超过密集探测器？
RQ2SIR 是否能有效缓解 Center Feature Missing，并从稀疏分组实现准确的实例级预测？
RQ3在像 Argoverse 2 这样的长距离基准测试中，FSD 相较于最先进方法在精度和速度方面的表现如何？
RQ4分组质量和 SIR 设计对大物体与小物体检测性能的影响是什么？

主要发现

FSD 在 Waymo Open Dataset 上在主流检测器中实现了最先进的性能，且无需测试时增强。
在 Argoverse 2 上，FSD 提供强力的长距离检测（高达 200 m），并比密集对手快 2.4 倍。
用 SIR 替代基于扩散的中心特征显著提升大目标召回率，解决 Center Feature Missing。
动态广播/池化使得无需点采样或填充即可实现高效的实例级特征提取，即使输入点数量很多也能保持高保真。
分组加上 SIR 相对于仅使用分组或仅使用 SIR 显著提升，凸显端到端实例级处理的重要性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。