QUICK REVIEW

[论文解读] SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

Jianhua Han, Xiwen Liang|arXiv (Cornell University)|Jun 21, 2021

Advanced Neural Network Applications参考文献 65被引用 33

一句话总结

SODA10M 是一个大型的二维自动驾驶数据集，包含 10M 未标注图像和 20K 标注图像，用于基准自监督和半监督对象检测方法，并在下游任务上评估预训练表示。

ABSTRACT

Aiming at facilitating a real-world, ever-evolving and scalable autonomous driving system, we present a large-scale dataset for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data, which is the first and largest dataset to date. Existing autonomous driving systems heavily rely on `perfect' visual perception models (i.e., detection) trained using extensive annotated data to ensure safety. However, it is unrealistic to elaborately label instances of all scenarios and circumstances (i.e., night, extreme weather, cities) when deploying a robust autonomous driving system. Motivated by recent advances of self-supervised and semi-supervised learning, a promising direction is to learn a robust detection model by collaboratively exploiting large-scale unlabeled data and few labeled data. Existing datasets either provide only a small amount of data or covers limited domains with full annotation, hindering the exploration of large-scale pre-trained models. Here, we release a Large-Scale 2D Self/semi-supervised Object Detection dataset for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected within 27833 driving hours under different weather conditions, periods and location scenes of 32 different cities. We provide extensive experiments and deep analyses of existing popular self/semi-supervised approaches, and give some interesting findings in autonomous driving scope. Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i.e., detection, semantic/instance segmentation) in autonomous driving domain. More information can refer to https://soda-2d.github.io.

研究动机与目标

通过利用海量未标注数据与有限注释，推动鲁棒的自动驾驶感知。
提供一个大规模、多样化的基准，用于驾驶场景中的自监督和半监督学习。
评估在 SODA10M 上的预训练对下游检测和分割任务的影响。

提出的方法

在 32 个城市收集 10M 未标注的道路图像和 20K 标注图像，涵盖多样的天气、时间段和地点。
在标注子集中用高质量的二维边界框注解 6 个对象类别。
在对 SODA10M 进行预训练后，评估一系列自监督和半监督学习方法在下游任务上的表现。
在尺度、多样性和泛化方面，将 SODA10M 与现有驾驶数据集进行比较。
分析领域自适应及不同预训练方案在白天 vs 夜间条件下的影响。

实验结果

研究问题

RQ1在大规模自动驾驶数据集上进行预训练，如何影响下游检测和分割性能？
RQ2自监督或半监督方法相比 ImageNet 预训练，是否从 SODA10M 的规模和多样性中获益更多？
RQ3在日/夜、天气、城市等不同条件下，使用 SODA10M 进行驾驶相关任务时，领域自适应收益有哪些？

主要发现

SODA10M 含有 10M 未标注和 20K 标注图像，SODA10M 在 32 个城市累计 27,833 小时驾驶数据收集。
SODA10M 通常在下游任务上比其他自动驾驶预训练数据集在 9/10 个任务上有更好的表现，作为上游预训练数据。
多实例且多样化的驾驶场景密度影响某些对比学习方法的有效性；简单的全局对比损失在自动驾驶数据上可能表现不佳。
半监督方法（STAC、Unbiased Teacher）优于仅自举标签，某些指标提升高达 4.9%。
在夜间域上，对半监督方法预训练在 SODA10M 可观提升，体现多样化未标注数据的领域自适应收益。
基于视频的自监督方法，使用未标注集合生成的帧，在合适的增强下显示出与之竞争的结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。