QUICK REVIEW

[论文解读] Object DGCNN: 3D Object Detection using Dynamic Graphs

Yue Wang, Justin Solomon|arXiv (Cornell University)|Oct 13, 2021

Advanced Neural Network Applications参考文献 66被引用 50

一句话总结

本文提出了 Object DGCNN，一种无 NMS 的三维目标检测器，通过动态图将对象建模为集合，并采用集合到集合损失和蒸馏，在 autonomous driving 基准测试中达到最先进的结果。

ABSTRACT

3D object detection often involves complicated training and testing pipelines, which require substantial domain knowledge about individual datasets. Inspired by recent non-maximum suppression-free 2D object detection models, we propose a 3D object detection architecture on point clouds. Our method models 3D object detection as message passing on a dynamic graph, generalizing the DGCNN framework to predict a set of objects. In our construction, we remove the necessity of post-processing via object confidence aggregation or non-maximum suppression. To facilitate object detection from sparse point clouds, we also propose a set-to-set distillation approach customized to 3D detection. This approach aligns the outputs of the teacher model and the student model in a permutation-invariant fashion, significantly simplifying knowledge distillation for the 3D detection task. Our method achieves state-of-the-art performance on autonomous driving benchmarks. We also provide abundant analysis of the detection model and distillation framework.

研究动机与目标

推动在三维目标检测中移除手设计的后处理（NMS），以提高效率。
开发一个输出固定大小对象查询集合的集合预测框架。
利用动态图推理对三维场景中的对象关系进行建模。
在训练过程中通过集合到集合目标和特权信息实现知识蒸馏。

提出的方法

使用基于网格的 BEV 特征提取器（PointPillars 或 SparseConv）来创建密集的 BEV 特征。
引入 Object DGCNN，在 L 层内传播，每层预测一组对象查询并通过学习采样和双线性插值聚合 BEV 特征。
在对象查询上用 DGCNN 风格的稀疏图来建模对象间交互。
应用一对一的集合到集合损失及匈牙利匹配以将预测与地面实况集合对齐。
实现集合到集合蒸馏，其中教师通过置换不变的输出对齐来引导学生，使特权信息传递成为可能。

实验结果

研究问题

RQ1是否可以将三维目标检测视为集合预测，在不牺牲精度的前提下移除 NMS 后处理？
RQ2在 BEV 特征之上集成 DGCNN 风格的对象关系是否优于密集自注意力机制的检测效果？
RQ3集合到集合蒸馏是否能够利用特权信息（例如密集点云）来提升性能？
RQ4骨干网络（PointPillars vs SparseConv）和 DGCNN 层数/邻居数量对检测性能的影响？
RQ5在自驾基准测试中，无 NMS 的检测器是否具备与基于 NMS 的最先进三维检测器的竞争力？

主要发现

所提出的方法在 autonomous driving 基准测试（nuScenes）上实现了最先进的结果，并且无需 NMS。
使用 PointPillars 或 SparseConv 作为骨干的 Object DGCNN 的性能优于 CenterPoint 变体，基于体素的设置达到很高的 NDS 和 mAP。
基于 DGCNN 的对象关系建模优于多头自注意力，16 个邻居数是性能的关键点。
增加 DGCNN 层数可以提升性能，证实了更深的动态图推理的好处。
集合到集合蒸馏（包括利用特权信息）相对于基线和其他蒸馏策略带来持续的提升。
该模型可以在预训练骨干网络的基础上端到端训练，在推理时不需要后处理即可得到可用的边框。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。