QUICK REVIEW

[论文解读] Towards Efficient 3D Object Detection with Knowledge Distillation

Jihan Yang, Shaoshuai Shi|arXiv (Cornell University)|May 30, 2022

Advanced Neural Network Applications被引用 28

一句话总结

本文研究知识蒸馏（KD）以打造高效的3D LiDAR检测器，在六组师生对上基准了用于 pillar 和 voxel 基于检测器的二维KD方法，并提出一种改进的KD流程，结合关键位置对数 logits KD 与教师引导初始化，以在大幅降低 FLOPs 的同时实现良好准确性。

ABSTRACT

Despite substantial progress in 3D object detection, advanced 3D detectors often suffer from heavy computation overheads. To this end, we explore the potential of knowledge distillation (KD) for developing efficient 3D object detectors, focusing on popular pillar- and voxel-based detectors.In the absence of well-developed teacher-student pairs, we first study how to obtain student models with good trade offs between accuracy and efficiency from the perspectives of model compression and input resolution reduction. Then, we build a benchmark to assess existing KD methods developed in the 2D domain for 3D object detection upon six well-constructed teacher-student pairs. Further, we propose an improved KD pipeline incorporating an enhanced logit KD method that performs KD on only a few pivotal positions determined by teacher classification response, and a teacher-guided student model initialization to facilitate transferring teacher model's feature extraction ability to students through weight inheritance. Finally, we conduct extensive experiments on the Waymo dataset. Our best performing model achieves $65.75\%$ LEVEL 2 mAPH, surpassing its teacher model and requiring only $44\%$ of teacher flops. Our most efficient model runs 51 FPS on an NVIDIA A100, which is $2.2 imes$ faster than PointPillar with even higher accuracy. Code is available at \url{https://github.com/CVMI-Lab/SparseKD}.

研究动机与目标

通过模型压缩和输入分辨率降低，鉴定如何获得高效且精确的3D检测器。
在六组师生对上基准现有的二维KD方法，针对 pillar 和 voxel 基于的3D检测器。
提出改进的KD策略以提升3D目标检测中的蒸馏效果。
证明蒸馏得到的轻量检测器在 Waymo 与 KITTI 数据集上在显著降低计算量的情况下能够超越或接近教师的性能。

提出的方法

研究模型压缩（宽度、深度）和输入分辨率降低，以从固定教师构建高效的学生检测器。
评估七种二维KD方法（logit KD、feature KD、label KD及其变体）在六组师生对上的 pillar 和 voxel 基于检测器中的表现。
提出关键位置对数KD，将蒸馏约束在高度自信或排名靠前的教师位置。
引入教师引导初始化（TGI），通过权重重映射和参数投影来转移教师的特征提取能力。
开发一个将关键位置对数KD、label KD 与 TGI 相结合的改进KD流程，并在 Waymo 和 KITTI 上评估其性能。

实验结果

研究问题

RQ1在3D LiDAR检测中，如何在给定强大教师的情况下构建保持高准确性的高效学生检测器？
RQ2对于 pillar 与 voxel 基于的3D检测器，哪些知识蒸馏策略最适合从教师到学生的迁移？
RQ3定向（关键位置）对数匹配和教师引导初始化是否能提升3D检测中的KD迁移？
RQ4压缩和KD方法在数据集（Waymo、KITTI）和检测器类型（pillar、voxel）上有多通用？

主要发现

宽度级别的压缩通常比深度压缩在3D检测器上带来更好的 CPR（准确性-效率权衡）。
pillar 基于检测器从输入分辨率降低中受益，而 voxel 基于检测器则受益于基于宽度的压缩，这是由于 BEV 特征冗余的差异。
特征 KD 方法通常提供最强的单独增益，但可能会干扰在3D检测中的其他KD流。
关键位置对数KD通过将模仿聚焦在实例附近或易错区域的高重要性区域来提升蒸馏。
教师引导初始化（TGI）有助于将教师的特征提取能力转移给学生，并与KD损失显示出强协同作用。
改进的KD流程在保持竞争性准确性的同时实现显著效率：CP-Voxel-S 相对于教师约提升 2.4 倍速度，mAPH 与教师相当；CP-Pillar-v0.64 在 Waymo 上仅使用约 25% 的教师 FLOPs，mAPH 下降约 3.3%。
蒸馏检测器在显著节省计算量的情况下可以超越或达到教师的性能（Waymo 与 KITTI 实验）。
跨阶段蒸馏表明来自更重的 PV-RCNN++ 检测器的提示转移可以在不增加推理成本的情况下对轻量 CP-Voxel 产生适度的提升。
这些方法可推广到其他检测器和任务，包括3D语义分割，显示出广泛的适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。