QUICK REVIEW

[论文解读] Exploring Data Augmentation for Multi-Modality 3D Object Detection

Wenwei Zhang, Zhe Wang|arXiv (Cornell University)|Dec 23, 2020

Advanced Neural Network Applications参考文献 53被引用 26

一句话总结

本文提出了一种多模态数据增强流水线——Transformation Flow，以及一种新颖的增强方法——MoCa，以解决多模态3D目标检测器因点云与图像间变换不一致而导致的性能不足问题。通过实现可逆、可重放的增强操作以及考虑遮挡的剪切-粘贴操作，该方法在nuScenes上实现了最先进性能，在KITTI上取得了具有竞争力的结果，且无需集成方法，荣获第三届nuScenes挑战赛最佳PKL奖。

ABSTRACT

It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud. This paper investigates the reason behind this phenomenon. Due to the fact that multi-modality data augmentation must maintain consistency between point cloud and images, recent methods in this field typically use relatively insufficient data augmentation. This shortage makes their performance under expectation. Therefore, we contribute a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying. In addition, considering occlusions, a point in different modalities may be occupied by different objects, making augmentations such as cut and paste non-trivial for multi-modality detection. We further present Multi-mOdality Cut and pAste (MoCa), which simultaneously considers occlusion and physical plausibility to maintain the multi-modality consistency. Without using ensemble of detectors, our multi-modality detector achieves new state-of-the-art performance on nuScenes dataset and competitive performance on KITTI 3D benchmark. Our method also wins the best PKL award in the 3rd nuScenes detection challenge. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.

研究动机与目标

探究尽管输入数据更丰富，为何基于激光雷达与RGB图像的多模态3D检测器仍表现不佳。
解决由于跨模态一致性约束导致的多模态学习中有效数据增强不足的问题。
开发一种框架，实现在点云与图像之间多样化、可逆的增强操作，同时保持空间对应关系。
设计一种符合物理现实的剪切-粘贴增强方法，确保在鸟瞰图（BEV）与2D图像域中均尊重遮挡关系。
在不使用集成检测器的情况下，实现在nuScenes上的最先进性能与在KITTI上的竞争力结果。

提出的方法

Transformation Flow 记录对点云和图像应用的可逆变换的序列与参数，确保跨模态增强的一致性。
该流水线确保可通过逆向点云变换并重放图像变换，将激光雷达空间中的任意一点映射到其对应的图像像素。
MoCa 引入了一种多模态剪切-粘贴增强方法，确保在鸟瞰图（BEV）与2D图像空间中均保持遮挡一致性。
MoCa 使用随机的交并比-前景（IoF）阈值，以在粘贴操作期间模拟真实的遮挡模式。
该方法在多模态设置中可同样有效地应用标准单模态增强技术，如随机翻转、缩放、旋转与平移。
该框架与现有检测器兼容，并可无缝集成至训练流水线中，包括预训练与联合训练策略。

实验结果

研究问题

RQ1为何基于激光雷达与RGB图像的多模态3D检测器常表现不如仅使用激光雷达的单模态方法？
RQ2数据增强不足在多大程度上限制了多模态3D检测器的性能？
RQ3如何在保持点云与图像间空间一致性的同时，有效应用于多模态3D检测的数据增强？
RQ4在多模态设置中应用标准增强技术（如剪切-粘贴）时，特别是在遮挡与物理合理性方面，面临哪些关键挑战？
RQ5统一的、可逆的变换流水线是否能实现更丰富的多模态3D检测器增强，同时不损害模态对齐？

主要发现

所提出的Transformation Flow可确保在点云与图像上应用多样化、可逆的增强操作（如翻转、旋转、缩放）时，具备跨模态一致性保障。
MoCa在KITTI 3D基准上将MVX-Net的中等mAP提升了11.3%，在nuScenes数据集上提升了5.8%，超过其单模态基线模型。
增强后的MVX-Net在不使用类别专用检测器集成的情况下，于nuScenes数据集上达到了新的最先进性能。
该方法在第三届nuScenes检测挑战赛中获得了最佳规划KL散度（PKL）得分，表明其在下游规划任务中具有更优的预测质量。
在nuImages上使用HTC对图像分支进行预训练，相比使用Faster R-CNN进行预训练，NDS提升0.7%，证明了领域特定预训练的优势。
消融研究显示，保留原始优化器与第三种训练策略（冻结ResNet-50主干网络）的协同作用可取得最佳性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。