QUICK REVIEW

[논문 리뷰] 3DSSD: Point-based 3D Single Stage Object Detector

Zetong Yang, Yanan Sun|arXiv (Cornell University)|2020. 02. 24.

Advanced Neural Network Applications참고 문헌 43인용 수 78

한 줄 요약

Introduces a lightweight, point-based 3D single-stage detector that removes upsampling and refinement stages, using a fusion sampling strategy and an anchor-free head to achieve fast, accurate 3D detection on KITTI and nuScenes.

ABSTRACT

Currently, there have been many kinds of voxel-based 3D single stage detectors, while point-based single stage methods are still underexplored. In this paper, we first present a lightweight and effective point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency. In this paradigm, all upsampling layers and refinement stage, which are indispensable in all existing point-based methods, are abandoned to reduce the large computation cost. We novelly propose a fusion sampling strategy in downsampling process to make detection on less representative points feasible. A delicate box prediction network including a candidate generation layer, an anchor-free regression head with a 3D center-ness assignment strategy is designed to meet with our demand of accuracy and speed. Our paradigm is an elegant single stage anchor-free framework, showing great superiority to other existing methods. We evaluate 3DSSD on widely used KITTI dataset and more challenging nuScenes dataset. Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well, with inference speed more than 25 FPS, 2x faster than former state-of-the-art point-based methods.

연구 동기 및 목표

동 voxelization이나 2단계 정제 없이 원시 포인트 클라우드에서 직접 효율적이고 정확한 3D 객체 탐지를 목표로 한다.
FP 레이어와 정제 모듈을 제거한 경량 포인트 기반 단일 스테이지 프레임워크를 개발한다.
다운샘플링 중 내부 포인트 보존과 견고한 탐지를 가능하게 하는 fusion sampling을 제안한다.
3D 중심성에 기반한 앵커 프리 회귀 헤드를 설계하여 후보 포인트의 3D 중심 근처 여부로 분류를 가이드한다.
KITTI와 nuScenes에서 높은 추론 속도로 SOTA 또는 경쟁력 있는 성능을 시연한다.

제안 방법

Fusion Sampling(FS)을 사용하여 Positive interior 포인트와 대표 음성 포인트를 모두 보존하는 다중 set abstraction 계층을 갖춘 백본을 사용한다.
Feature-FPS(F-FPS)는 공간 거리와 특징 거리의 조합으로 포인트를 선택하여 다운샘플링 중 전경 포인트 손실을 완화한다.
Candidate Generation(CG) 계층은 F-FPS 포인트를 이동시켜 후보 중심을 생성하고 특징 추출을 위한 주변 포인트를 모은다.
Anchor-free 회귀 헤드는 단일 단계에서 각 후보 포인트에 대해 3D 상자 오프셋, 크기, 방향을 예측한다.
3D 중심성 할당은 인스턴스 중심에 대한 근접성으로 후보를 평가하고 3D 기하학적 중심성 공식을 사용하여 분류를 안내한다.
손실은 CG에 대한 분류, 회귀(거리, 크기, 각도, 모서리) 및 시프트 감독을 결합한다.

실험 결과

연구 질문

RQ1FP 레이어나 정제 모듈 없이도 완전 포인트 기반의 3D 탐제가 경쟁력 있는 정확도를 달성할 수 있는가?
RQ2Fusion sampling 전략이 어려운 데이터셋에서 전경 포인트 보존 및 전반적인 탐지 성능을 개선하는가?
RQ3단일 단계 프레임워크에서 3D 중심성으로 가이드되는 앵커 프리 헤드가 정확한 3D 경계 상자 회귀에 충분한가?

주요 결과

데이터 세트	방법	쉬운 AP	중간 AP	어려운 AP
KITTI val	VoxelNet [36]	81.97	65.46	62.85
KITTI val	SECOND [31]	87.43	76.48	69.10
KITTI val	PointPillars [13]	-	77.98	-
KITTI val	본 제안	89.71	79.45	78.67

KITTI에서 최신의 voxel 기반 단일 스테이지 탐지기보다 우수하고, Titan V에서의 38 ms로 높은 속도에서 2단계 포인트 기반 방법과 경쟁력을 보인다.
FS와 F-FPS 및 D-FPS를 사용한 Fusion sampling은 내부 포인트를 보존하고 충분한 음수 포인트를 유지하여 강력한 분류를 가능하게 하며, AP를 D-FPS 및 F-FPS 단독보다 향상시킨다.
3D 중심성을 갖춘 앵커 프리 회귀는 인스턴스 중심 근처의 후보를 우선시하여 강한 위치 추정 능력을 제공한다.
KITTI val 결과에서 Ours는 89.71 Easy, 79.45 Moderate, 78.67 Hard AP를 달성하며 SECOND 및 VoxelNet을 능가하고 중간에서 PointPillars와 견주거나 상회한다.
nuScenes에서 Ours는 voxel 기반의 단일 스테이지 방법보다 강한 성능과 두 단계 포인트 기반 접근법에 견줄 만한 결과를 보여주며 속도와 속성 예측에서 우수한 성능을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.