QUICK REVIEW

[논문 리뷰] BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Junjie Huang, Guan Huang|arXiv (Cornell University)|2021. 12. 22.

Advanced Image and Video Retrieval Techniques인용 수 294

한 줄 요약

BEVDet는 모듈식 BEV 기반 3D 객체 탐지 프레임워크를 도입하여 BEV 공간 보강 및 Scale-NMS로 nuScenes에서 시각 기반 방법 중 최첨단 성능을 달성하고 추론 속도가 빠릅니다.

ABSTRACT

Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for fundamentally pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed. We merely reuse existing modules to build its framework but substantially develop its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy. In the experiment, BEVDet offers an excellent trade-off between accuracy and time-efficiency. As a fast version, BEVDet-Tiny scores 31.2% mAP and 39.2% NDS on the nuScenes val set. It is comparable with FCOS3D, but requires just 11% computational budget of 215.3 GFLOPs and runs 9.2 times faster at 15.6 FPS. Another high-precision version dubbed BEVDet-Base scores 39.3% mAP and 47.2% NDS, significantly exceeding all published results. With a comparable inference speed, it surpasses FCOS3D by a large margin of +9.8% mAP and +10.0% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .

연구 동기 및 목표

자율 주행에서 BEV 시맨틱 분할과 정렬된 통일된 BEV 기반 3D 객체 탐지를 목표로 한다.
성능을 향상시키면서 기존 구성 요소를 재사용하는 모듈형 BEVDet 프레임워크를 제안한다.
BEV 학습의 과적합을 BEV 공간 보강 및 전용 데이터 처리 전략으로 해결한다.
BEV 공간에 특화된 Scale-NMS를 도입하여 BEV 객체 분포에 맞춘 억제 임계값을 개선한다.

제안 방법

네 모듈로 구성된 BEVDet 아키텍처: 이미지 뷰 인코더, 뷰 트랜스포머, BEV 인코더, 및 태스크별 헤드.
깊이 예측을 이용한 Lift-Splat-Shoot 기반 뷰 변환으로 BEV 특징을 생성한다.
BEV 공간 데이터 보강 전략을 도입하여 BEV 학습을 정규화한다.
BEV 공간에 맞춘 Scale-NMS를 개발하여 BEV 공간에서 클래스별 억제 임계값을 조정한다.
데이터 증강, BEV 인코더, 해상도 효과를 벤치마크하고 소거(ablations)하여 정확도와 효율성의 트ade-off를 최적화한다.

실험 결과

연구 질문

RQ1BEV 기반 3D 객체 탐지가 이미지 뷰 기반 방법보다 탐지 속도를 유지하면서 더 나은 성능을 발휘할 수 있는가?
RQ2BEV 특화 데이터 증강 및 NMS 전략이 BEVDet의 정확도와 견고성에 어떤 영향을 미치는가?
RQ3입력 해상도, BEV 해상도 및 네트워크 구성요소가 nuScenes에서 탐지 성능에 어떤 영향을 미치는가?

주요 결과

BEVDet-Tiny는 704×256 입력에서 31.2% mAP와 39.2% NDS를 달성하고 215.3 GFLOPs, 15.6 FPS로 유사한 예산에서 FCOS3D를 능가한다.
BEVDet-Base는 1600×640 입력에서 2,962.6 GFLOPs로 39.3% mAP와 47.2% NDS를 달성하며 속도는 경쟁력 있는 수준(1.9 FPS)으로 유지한다.
Scale-NMS는 작은 객체에서 특히 이점을 보이며 보행자 +4.8% AP, 차량 원뿔/콘 등의 +7.5% AP를 포함해 전체 mAP를 29.5%에서 31.2%로 상승시킨다.
BEV 공간 증강(BDA)과 이미지 공간 증강(IDA)을 결합하면 피크 성능이 크게 향상되어 (최대 31.6% mAP) 학습 안정성이 증가한다.
nuScenes 테스트 세트에서 BEVDet는 42.2% mAP와 48.2% NDS를 달성해 시각 기반 방법 중 1위를 차지하고 LiDAR 기반 성능에 근접한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.