QUICK REVIEW

[论文解读] Move to See Better: Towards Self-Supervised Amodal Object Detection.

Zhaoyuan Fang, Ayush Jain|arXiv (Cornell University)|Nov 30, 2020

Advanced Neural Network Applications参考文献 41被引用 6

一句话总结

本文提出了一种自监督框架，用于无模态目标检测，通过利用移动智能体在3D环境中获取的多视角RGB-D数据，提升2D目标检测器在未见场景中的性能。通过将置信的2D检测结果反投影至3D空间，执行无监督3D分割，并将结果重新投影为伪标签，该方法在无需人工标注的情况下显著提升了检测器性能，在室内和室外数据集上均优于先前的自监督方法。

ABSTRACT

Humans learn to better understand the world by moving around their environment to get more informative viewpoints of the scene. Most methods for 2D visual recognition tasks such as object detection and segmentation treat images of the same scene as individual samples and do not exploit object permanence in multiple views. Generalization to novel scenes and views thus requires additional training with lots of human annotations. In this paper, we propose a self-supervised framework to improve an object detector in unseen scenarios by moving an agent around in a 3D environment and aggregating multi-view RGB-D information. We unproject confident 2D object detections from the pre-trained detector and perform unsupervised 3D segmentation on the point cloud. The segmented 3D objects are then re-projected to all other views to obtain pseudo-labels for fine-tuning. Experiments on both indoor and outdoor datasets show that (1) our framework performs high-quality 3D segmentation from raw RGB-D data and a pre-trained 2D detector; (2) fine-tuning with self-supervision improves the 2D detector significantly where an unseen RGB image is given as input at test time; (3) training a 3D detector with self-supervision outperforms a comparable self-supervised method by a large margin.

研究动机与目标

在无需大量人工标注的情况下，提升2D目标检测器在新场景和新视角下的泛化能力。
通过将场景视为多视角序列而非孤立图像，利用物体在多个视角下的持久性。
开发一种自监督框架，利用3D几何结构和多视角一致性，生成高质量伪标签以微调检测器。
证明自监督3D分割与伪标签生成可显著提升2D目标检测器在未见场景中的性能。

提出的方法

该框架使用预训练的2D目标检测器，在移动智能体于3D环境中采集的RGB-D图像上生成置信的检测结果。
利用深度信息将置信的2D检测结果反投影至3D空间，形成初始的3D目标候选区域。
在点云上执行无监督3D分割，以优化并整合反投影后的检测结果，形成连贯的3D物体。
将分割后的3D物体重新投影至其他所有视角，生成一致的伪标签，用于2D检测器的自监督微调。
自监督微调过程利用多视角一致性，提升检测器对未见场景的鲁棒性与泛化能力。
该方法使用生成的伪标签训练3D检测器，其性能在现有自监督方法中达到最先进水平。

实验结果

研究问题

RQ1来自移动智能体的多视角RGB-D数据是否能在无需人工标注的情况下提升2D目标检测器在未见场景中的泛化能力？
RQ2对反投影后的2D检测结果进行无监督3D分割，在生成自监督学习的高质量伪标签方面效果如何？
RQ3使用多视角伪标签进行自监督微调，能在多大程度上提升2D目标检测器在未见RGB图像上的性能？
RQ4与现有自监督方法相比，该方法在3D分割质量和检测器准确率方面表现如何？
RQ5该框架是否能在室内和室外等多样化环境中实现良好泛化，且仅需极少监督？

主要发现

所提框架能从原始RGB-D数据和预训练2D检测器中生成高质量3D分割结果，证明其在无监督条件下具备强大的几何推理能力。
自监督微调在未见RGB图像上的测试中显著提升了2D目标检测器的性能，表明其对新视角具有强大泛化能力。
该方法在3D检测任务上优于可比的自监督基线模型，证明通过3D分割实现多视角伪标签生成的有效性。
该框架在室内和室外数据集上均表现出良好泛化能力，证实其对领域偏移具有鲁棒性。
利用多视角下物体的持久性可实现一致的伪标签生成，从而在无需人工标注数据的情况下提升检测器准确率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。