QUICK REVIEW

[논문 리뷰] FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection

Tai Wang, Xinge Zhu|arXiv (Cornell University)|2021. 04. 22.

Advanced Neural Network Applications참고 문헌 38인용 수 27

한 줄 요약

FCOS3D는 3D 타깃을 이미지 평면에 투영하고 3D-센터 기반 감독 및 다중 수준 3D 예측을 사용하여 앵커 프리 2D 탐지기를 단안 3D 물체 탐지에 적용한다. 비전-전용 방법 중 nuScenes 카메라 트랙에서 최고 성능을 달성한다.

ABSTRACT

Monocular 3D object detection is an important task for autonomous driving considering its advantage of low cost. It is much more challenging than conventional 2D cases due to its inherent ill-posed property, which is mainly reflected in the lack of depth information. Recent progress on 2D detection offers opportunities to better solving this problem. However, it is non-trivial to make a general adapted 2D detector work in this 3D task. In this paper, we study this problem with a practice built on a fully convolutional single-stage detector and propose a general framework FCOS3D. Specifically, we first transform the commonly defined 7-DoF 3D targets to the image domain and decouple them as 2D and 3D attributes. Then the objects are distributed to different feature levels with consideration of their 2D scales and assigned only according to the projected 3D-center for the training procedure. Furthermore, the center-ness is redefined with a 2D Gaussian distribution based on the 3D-center to fit the 3D target formulation. All of these make this framework simple yet effective, getting rid of any 2D detection or 2D-3D correspondence priors. Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020. Code and models are released at https://github.com/open-mmlab/mmdetection3d.

연구 동기 및 목표

7-DoF 3D 타깃을 이미지 도메인 표현으로 변환하여 2D 탐지기의 강점을 활용한다.
회귀를 위해 3D 속성을 2D 중심 이동과 3D 크기/자세로 분리한다.
2D 스케일과 3D 중심을 안내로 삼아 타깃을 피처 피라미드 레벨 전반에 분포시킨다.
3D 중심을 기반으로 한 2D 가우시안으로 center-ness를 재정의하여 3D 타깃 기하를 반영한다.
2D-3D 프라이어 없이도 단안 3D 탐지를 달성하면서 학습/추론 효율을 유지한다.

제안 방법

FCOS를 기반으로 ResNet101 백본과 FPN을 사용해 다중 스케일 특징 맵(P3–P7)을 생성한다.
3D 타깃을 이미지에 투영해 2.5D 중심을 얻고 2D 오프셋(Δx, Δy)과 깊이(d), 더불어 3D 크기와 방향(w, l, h, θ, vx, vy)으로 분리한다.
반대 방향을 해결하기 위해 2-빈 방향 인코딩과 각도 성분으로 회전을 예측한다.
2D 스케일 가이던스와 3D-센터 기반 포그라운드 기준으로 타깃을 피처 레벨에 할당하고, 모호함을 완화하기 위해 거리 기반 센터 샘플링을 사용한다.
투영된 3D 중심 주위에 2D 가우시안으로 모델링된 center-ness 스코어 c를 사용하고 BCE 손실로 c를 학습한다.
분류에는 focal loss를, 속성 및 방향에는 softmax/BCE를, 회귀 타깃에는 신중하게 설정된 가중치로 Smooth-L1을 사용해 학습한다.

실험 결과

연구 질문

RQ1간단한 앵커 프리 2D 탐지기를 재목적화하여 2D-3D 프라이어 없이 단안 이미지에서 3D 속성을 예측할 수 있는가?
RQ23D 타깃을 어떻게 재정의하고 2D 피처 레벨에 할당하여 단안 3D 탐지의 재현율과 정확도를 극대화할 수 있는가?
RQ3투영된 3D 중심에 연결된 2D 가우시안 기반의 center-ness가 3D 설정에서 원래의 FCOS center-ness보다 저품질 예측을 더 잘 억제하는가?
RQ4깊이 재파라메트라이제이션과 해리된 회귀 헤드가 nuScenes에서 3D 방향과 전반적 탐지 점수에 미치는 영향은 무엇인가?
RQ5대형 물체에서 깊이 공간 손실 재파라미터화와 거리 기반 타깃 할당의 성능 이점은 무엇인가?

주요 결과

방법	데이터셋	모달리티	mAP	mATE	mASE	mAOE	mAVE	mAAE	NDS
CenterFusion	test	Camera & Radar	0.326	0.631	0.261	0.516	0.614	0.115	0.449
PointPillars	test	LiDAR	0.305	0.517	0.290	0.500	0.316	0.368	0.453
MEGVII	test	LiDAR	0.528	0.300	0.247	0.379	0.245	0.140	0.633
LRM0	test	Camera	0.294	0.752	0.265	0.603	1.582	0.14	0.371
MonoDIS	test	Camera	0.304	0.738	0.263	0.546	1.553	0.134	0.384
CenterNet	test	Camera (HGLS)	0.338	0.658	0.255	0.629	1.629	0.142	0.4
Noah CV Lab	test	Camera	0.331	0.660	0.262	0.354	1.663	0.198	0.418
FCOS3D (Ours)	test	Camera	0.358	0.690	0.249	0.452	1.434	0.124	0.428
CenterNet	val	Camera (HGLS)	0.306	0.716	0.264	0.609	1.426	0.658	0.328
FCOS3D (Ours)	val	Camera	0.343	0.725	0.263	0.422	1.292	0.153	0.415

FCOS3D는 nuScenes 테스트 세트에서 0.358 mAP 및 0.428 NDS를 달성하며, RGB 전용 기초 모델들보다 우수한 성능을 보인다.
검증 세트에서 FCOS3D는 0.343 mAP 및 0.415 NDS를 달성해 이전 RGB 기반 단안 탐지기보다 확실한 이득을 보인다.
LiDAR 기반 및 다중 모달 방법과 비교할 때 RGB 입력의 FCOS3D는 경쟁력 있는 mAP 및 각도 예측을 달성하며, 2-빈 방향 인코딩으로 회전 처리에서 현저한 개선이 있다.
고찰들은 원래 공간의 깊이 손실, 거리 기반 타깃 할당, 더 강한 백본(ResNet101, DCN), 해리된 회귀 헤드가 크게 mAP와 NDS를 개선한다는 것을 보여준다.
최종 아키텍처는 테스트 시간 증강과 더 많은 학습 에포크의 이점을 얻어 nuScenes 카메라 트랙에서 비전-전용 접근법 중 최첨단 성능을 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.