QUICK REVIEW

[논문 리뷰] One Million Scenes for Autonomous Driving: ONCE Dataset

Jiageng Mao, Minzhe Niu|arXiv (Cornell University)|2021. 06. 21.

Advanced Neural Network Applications참고 문헌 56인용 수 133

한 줄 요약

이 논문은 ONCE 데이터셋을 1 million LiDAR scenes와 7 million images로 3D 물체 탐지에 도입하고, ONCE를 사용한 3D 탐지에서 자가/반/비지도 방법을 평가하는 벤치마크를 제시합니다. 또한 기존 데이터셋과 비교한 데이터 품질, 다양성 및 도메인 적응 가능성을 분석합니다.

ABSTRACT

Current perception models in autonomous driving have become notorious for greatly relying on a mass of annotated data to cover unseen cases and address the long-tail problem. On the other hand, learning from unlabeled large-scale collected data and incrementally self-training powerful recognition models have received increasing attention and may become the solutions of next-generation industry-level powerful and robust perception models in autonomous driving. However, the research community generally suffered from data inadequacy of those essential real-world scene data, which hampers the future exploration of fully/semi/self-supervised methods for 3D perception. In this paper, we introduce the ONCE (One millioN sCenEs) dataset for 3D object detection in the autonomous driving scenario. The ONCE dataset consists of 1 million LiDAR scenes and 7 million corresponding camera images. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available (e.g. nuScenes and Waymo), and it is collected across a range of different areas, periods and weather conditions. To facilitate future research on exploiting unlabeled data for 3D detection, we additionally provide a benchmark in which we reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset. We conduct extensive analyses on those methods and provide valuable observations on their performance related to the scale of used data. Data, code, and more information are available at https://once-for-auto-driving.github.io/index.html.

연구 동기 및 목표

자율 주행을 위한 대규모이고 다양한 3D 장면 데이터셋을 제공하여 데이터의 부족 문제를 해결한다.
3D 탐지를 위한 자가-/반-/비지도 학습 벤치마크를 통해 라벨이 없는 데이터를 탐색할 수 있게 한다.
3D 인식의 데이터 품질, 다양성, 일반화에 대한 교차 도메인 연구 및 분석을 촉진한다.

제안 방법

LiDAR 및 카메라 데이터를 수집하고 다운샘플링하여 1M 개의 3D 장면과 7M 개의 이미지를 144 시간의 주행으로 만들다.
5개 카테고리에 대해 16k 장면에 3D 박스 주석을 달고 이미지를 위한 2D 박스로 투영한다.
모든 장면에 대해 날씨, 시간, 지역 레이블을 제공하고, 대량의 비라벨 풀을 포함한 학습/검증/테스트로 나눈다.
ONCE에서 단일/다중 모달리티의 3D 탐지기를 통합 설정으로 벤치마크한다.
3D 탐지에서 자가-학습, 반-학습, 비지도 도메인 적응 방법을 재현하고 평가한다.
사전 학습 효과 및 분포 비교를 통해 데이터 품질과 다양성을 분석한다.

실험 결과

연구 질문

RQ1ONCE에서의 사전학습이 nuScenes 및 Waymo와 비교해 다운스트림 3D 탐지 성능에 어떤 영향을 미치는가?
RQ2ONCE의 라벨 없는 데이터를 활용한 자가-/반-/비지도 방법이 3D 물체 탐지에 미치는 영향은 무엇인가?
RQ3다양한 데이터 규모에서 다양한 자가지도 및 반지도 전략이 3D 탐지에 어떤 성능 차이를 보이는가?
RQ4비지도 도메인 적응이 ONCE를 포함한 교차 데이터셋 3D 탐지에 얼마나 기여하는가?
RQ5데이터 다양성(날씨, 시간, 지역)이 자율 주행 장면의 탐지 성능에 어떤 역할을 하는가?

주요 결과

ONCE는 우수한 사전학습 이점을 제공한다; ONCE에서 사전학습된 모델은 KITTI에서 미세조정할 때 nuScenes/Waymo 사전학습에 비해 3D mAP가 더 높다.
자가-/반-/비지도 방법은 라벨이 없는 ONCE 데이터를 사용할 때 3D 탐지 성능을 향상시키고, 비라벨 데이터가 늘어날수록 성능 이익이 커진다.
클러스터링 기반 자가지도 방법들(SwAV, DeepCluster)은 대규모 설정에서 대조적 방법들(BYOL, PointContrast)보다 일반적으로 우수하다.
반지도 방법들(Mean Teacher, SESS, 3DIoUMatch)은 주목할 만한 이득을 보이며, Mean Teacher는 대규모 비라벨 데이터에서 최대 59.99% mAP에 도달한다.
온전한 원천 소스 대비 ONCE로의 비지도 도메인 적응은 의미 있는 개선을 보이나, Oracle 성능과의 격차는 여전히 남아 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.