QUICK REVIEW

[논문 리뷰] SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

Jianhua Han, Xiwen Liang|arXiv (Cornell University)|2021. 06. 21.

Advanced Neural Network Applications참고 문헌 65인용 수 33

한 줄 요약

SODA10M은 자가감독 및 준지도 객체 탐지 방법을 벤치마크하고 다운스트림 작업에서 사전 학습 표현을 평가하는 데 사용되는 1000만 개의 라벨이 없는 이미지와 2만 개의 라벨이 있는 이미지를 포함한 대규모 2D 자율주행 데이터 세트입니다.

ABSTRACT

Aiming at facilitating a real-world, ever-evolving and scalable autonomous driving system, we present a large-scale dataset for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data, which is the first and largest dataset to date. Existing autonomous driving systems heavily rely on `perfect' visual perception models (i.e., detection) trained using extensive annotated data to ensure safety. However, it is unrealistic to elaborately label instances of all scenarios and circumstances (i.e., night, extreme weather, cities) when deploying a robust autonomous driving system. Motivated by recent advances of self-supervised and semi-supervised learning, a promising direction is to learn a robust detection model by collaboratively exploiting large-scale unlabeled data and few labeled data. Existing datasets either provide only a small amount of data or covers limited domains with full annotation, hindering the exploration of large-scale pre-trained models. Here, we release a Large-Scale 2D Self/semi-supervised Object Detection dataset for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected within 27833 driving hours under different weather conditions, periods and location scenes of 32 different cities. We provide extensive experiments and deep analyses of existing popular self/semi-supervised approaches, and give some interesting findings in autonomous driving scope. Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i.e., detection, semantic/instance segmentation) in autonomous driving domain. More information can refer to https://soda-2d.github.io.

연구 동기 및 목표

대용량의 라벨이 없는 데이터와 제한된 주석을 활용하여 강건한 자율주행 인식을 촉진한다.
운전 시나리오에서 자체감독 및 준감독 학습을 위한 대규모의 다양하고 포괄적인 벤치마크를 제공한다.
SODA10M에서의 사전 학습이 다운스트림 탐지 및 분할 작업에 어떤 영향을 미치는지 평가한다.

제안 방법

32개 도시 전역에서 다양한 날씨, 시간대 및 위치에 걸쳐 10M 라벨이 없는 도로 이미지와 20K 라벨이 있는 이미지를 수집한다.
라벨링 부분에서 6개 객체 범주에 대해 고품질 2D 바운딩 박스를 주석한다.
SODA10M에서 사전 학습한 후 다운스트림 작업에서 다양한 자체감독 및 준감독 학습 방법을 평가한다.
규모, 다양성 및 일반화 측면에서 SODA10M을 기존 운전 데이터세트와 비교한다.
주간 대 야간 조건에서 서로 다른 사전 학습 체계의 도메인 적응 효과를 분석한다.

실험 결과

연구 질문

RQ1대규모 자율주행 데이터세트에서의 사전 학습이 다운스트림 탐지 및 분할 성능에 어떤 영향을 미치는가?
RQ2SODA10M의 규모와 다양성이 ImageNet 사전 학습과 비교하여 자체감독 또는 준감독 방법에 더 큰 이점을 주는가?
RQ3주간/야간, 날씨, 도시 등 다양한 조건에서 운전 관련 작업에 SODA10M을 사용할 때의 도메인 적응 이점은 무엇인가?

주요 결과

SODA10M은 32개 도시에서 27,833시간의 운전 시간에 걸쳐 수집된 10M 라벨이 없는 이미지와 20K 라벨이 있는 이미지를 포함한다.
SODA10M은 상류 사전 학습 데이터로 사용될 때 일반적으로 다른 자율주행 사전 학습 데이터 세트보다 9/10 과제에서 다운스트림 성능이 더 좋다.
다중 인스턴스의 다양하고 다양한 주행 장면의 밀도는 특정 대비 방법의 효과에 영향을 미친다; 간단한 글로벌 대조 손실은 자율주행 데이터에서 성능이 떨어질 수 있다.
준지도 방법(STAC, Unbiased Teacher)은 단독으로 의사 라벨링을 능가하며 일부 지표에서 최대 4.9%의 향상을 보인다.
SODA10M에서의 사전 학습은 준지도 방법에서 야간 영역에서 주목할 만한 이점을 제공하며, 다양한 비라벨 데이터의 도메인 적응 이점을 보여준다.
비디오 기반 자체감독 방법은 비라벨 세트에서 생성된 프레임을 사용하여 적절한 증강으로 경쟁력 있는 결과를 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.