QUICK REVIEW

[论文解读] TANet: Robust 3D Object Detection from Point Clouds with Triple Attention

Zhe Liu, Xin Zhao|arXiv (Cornell University)|Dec 11, 2019

Advanced Neural Network Applications参考文献 31被引用 29

一句话总结

TANet 提出了一种新颖的点云 3D 目标检测框架，通过三重注意力（TA）模块和端到端回归（CFR）提升在噪声和挑战性场景下的鲁棒性。TA 模块联合建模通道注意力、点注意力和体素注意力，以抑制噪声并突出判别性特征；CFR 则利用融合的跨层特征对边界框进行精细化调整。在 KITTI 基准测试中，TANet 达到最先进性能，行人类别排名第一，且在噪声条件下显著优于先前方法，推理速度达 29 FPS。

ABSTRACT

In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches. We observe two crucial phenomena: 1) the detection accuracy of the hard objects, e.g., Pedestrians, is unsatisfactory, 2) when adding additional noise points, the performance of existing approaches decreases rapidly. To alleviate these problems, a novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module. By considering the channel-wise, point-wise and voxel-wise attention jointly, the TA module enhances the crucial information of the target while suppresses the unstable cloud points. Besides, the novel stacked TA further exploits the multi-level feature attention. In addition, the CFR module boosts the accuracy of localization without excessive computation cost. Experimental results on the validation set of KITTI dataset demonstrate that, in the challenging noisy cases, i.e., adding additional random noisy points around each object,the presented approach goes far beyond state-of-the-art approaches. Furthermore, for the 3D object detection task of the KITTI benchmark, our approach ranks the first place on Pedestrian class, by using the point clouds as the only input. The running speed is around 29 frames per second.

研究动机与目标

提升在噪声和挑战性条件下点云中 3D 目标检测的鲁棒性，特别是对行人等难以检测的目标。
解决因点云稀疏和背景干扰导致的小尺度、杂乱目标（如行人）检测精度差的问题。
降低真实世界 LiDAR 数据中添加随机噪声点所导致的性能下降。
开发一种轻量化、高效的检测框架，在不增加过多计算成本的前提下保持高精度。

提出的方法

三重注意力（TA）模块联合建模通道注意力、点注意力和体素注意力，以增强判别性特征并抑制不稳定或噪声点。
TA 模块通过逐元素相乘融合空间（点注意力）和通道注意力，随后引入体素注意力以捕捉全局上下文信息。
采用堆叠式 TA 机制，以在不同感受野下提取多层次特征表示。
端到端回归（CFR）模块首先生成粗略的边界框预测，然后利用金字塔采样聚合（PSA）融合跨层特征图进行精细化调整。
PSA 模块通过聚合多层特征提升定位精度，充分利用分层上下文信息。
整个网络端到端可训练，且在 KITTI 数据集上推理速度约为 29 FPS。

实验结果

研究问题

RQ1如何联合设计注意力机制，以提升在噪声点云中的特征表示能力？
RQ2是否可通过端到端回归策略提升定位精度，同时不增加计算成本？
RQ3体素注意力的引入在存在随机噪声点的情况下如何提升模型鲁棒性？
RQ4所提出的注意力与回归架构在具有挑战性的噪声检测场景下，相较于现有最先进方法，性能提升程度如何？

主要发现

在 KITTI 验证集上，加入 100 个噪声点时，TANet 达到 67.79% 的 3D mAP，显著优于基线模型（65.59%）及其他注意力组合方法。
所提出的点注意力与通道注意力融合方法（PACA）实现 67.38% mAP，优于拼接和顺序融合方法。
通过 TA 模块引入体素注意力后，mAP 提升至 67.79%，证明多层级注意力融合的有效性。
PSA 模块与 TA 模块结合后，mAP 提升 2.1%，表明二者具有显著互补性。
当同时使用 TA 与 PSA 模块时，mAP 达到 69.35%，显著优于 RefineDet 与基线模型。
在 KITTI 基准测试中，TANet 在行人类别排名第一，mAP 达到 58.43%，凸显其在困难目标检测中的卓越鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。