QUICK REVIEW

[논문 리뷰] Complex-YOLO: Real-time 3D Object Detection on Point Clouds

Martín Simón, Stefan Milz|arXiv (Cornell University)|2018. 03. 16.

Advanced Neural Network Applications참고 문헌 24인용 수 78

한 줄 요약

Complex-YOLO는 Euler-Region-Proposal 네트워크를 도입하여 LiDAR 포인트 클라우드에서 3D 방향 박스를 실시간으로 직접 추정하며, 카메라 입력 없이도 높은 효율성과 다중 클래스 탐지를 달성합니다.

ABSTRACT

Lidar based 3D object detection is inevitable for autonomous driving, because it directly links to environmental understanding and therefore builds the base for prediction and motion planning. The capacity of inferencing highly sparse 3D data in real-time is an ill-posed problem for lots of other application areas besides automated vehicles, e.g. augmented reality, personal robotics or industrial automation. We introduce Complex-YOLO, a state of the art real-time 3D object detection network on point clouds only. In this work, we describe a network that expands YOLOv2, a fast 2D standard object detector for RGB images, by a specific complex regression strategy to estimate multi-class 3D boxes in Cartesian space. Thus, we propose a specific Euler-Region-Proposal Network (E-RPN) to estimate the pose of the object by adding an imaginary and a real fraction to the regression network. This ends up in a closed complex space and avoids singularities, which occur by single angle estimations. The E-RPN supports to generalize well during training. Our experiments on the KITTI benchmark suite show that we outperform current leading methods for 3D object detection specifically in terms of efficiency. We achieve state of the art results for cars, pedestrians and cyclists by being more than five times faster than the fastest competitor. Further, our model is capable of estimating all eight KITTI-classes, including Vans, Trucks or sitting pedestrians simultaneously with high accuracy.

연구 동기 및 목표

LiDAR 데이터만을 사용하여 자율주행을 위한 실시간 3D 물체 탐지를 구현하는 것을 목표로 한다.
Bird's-eye 뷰 BEV 맵으로부터 직교 좌표계에서 3D 경계 상자를 구성하는 빠르고 엔드-투-엔드 네트워크를 개발한다.
각도 특이점을 피하면서 물체 방향을 견고하게 추정하기 위한 Euler 회귀 접근법(E-RPN)을 도입한다.
다중 클래스에서 KITTI에 대해 경쟁력 있는 정확도를 유지하면서 최첨단 효율성을 달성한다.

제안 방법

LiDAR 포인트 클라우드를 80m x 40m 범위로 커버하는 단일 조감도(BEV) RGB 맵(높이, 강도, 밀도)으로 전처리한다.
BEV 맵에서 단일 패스 예측을 수행하도록 YOLOv2 스타일의 경량 CNN 아키텍처를 적용한다.
3D 상자 매개변수(x, y, w, l)와 방향을 회귀하기 위해 복소수 기반 각도 회귀(b_phi = arctan2(t_im, t_re))를 사용하는 Euler-Region-Proposal(E-RPN)을 도입한다.
KITTI 물체 형태를 커버하기 위해 세 가지 앵커 크기와 두 가지 방향 방향을 사용하고, 그리드 셀당 다섯 개의 상자를 예측하며 관련 점수를 부여한다.
YOLO 스타일 손실과 새로운 Euler 회귀 손실을 결합하여 특이점이 없는 복소 공간에서 각도 예측을 최적화한다.

실험 결과

연구 질문

RQ1실시간 LiDAR 전용 모델이 KITTI의 여러 클래스에 대해 정확한 3D 방향 박스를 생성할 수 있는가?
RQ2복소 공간(Euler 회귀)에 각도 회귀를 내재화하는 것이 방향 안정성과 일반화 성능을 향상시키는가?
RQ3단일 BEV 맵과 단일 전방향 패스 사용 시 탐지 속도와 정확도 간의 트레이드오프는 무엇인가?
RQ4단일 네트워크가 카메라 입력 없이도 실시간 성능을 유지하면서 다중 클래스를 동시에 예측할 수 있는가?
RQ5제안된 방법은 BEV와 3D 탐지 작업에서 KITTI 벤치마크에서 어떻게 성능을 발휘하는가?

주요 결과

Titan X에서 실시간 성능(>50 fps)을 달성하면서 KITTI BEV 탐지에서 경쟁력 있는 정확도를 유지한다.
BEV 탐지에서 효율성 면에서 선도적인 LiDAR 기반 방법들보다 최소 5배 더 빠르고, 일부 비교에서 10배 이상 더 빠르다.
각도 특이점을 피하고 일반화를 향상시키기 위해 복소수 각도 회귀(Euler 회귀)로 방향을 인코딩한다.
LiDAR 입력만으로 8개 KITTI 클래스(밴, 트럭, 앉아 있는 보행자 포함)를 예측하며 카메라 데이터는 필요 없다.
하나의 엔드-투-엔드 네트워크가 한 번의 순전파로 모든 경계 상자를 처리하여 임베디드 플랫폼(예: TX2)에서의 배치를 가능하게 한다.
CAR, PEDESTRIAN, CYCLIST 카테고리에서 강력한 BEV 및 3D 탐지 성능과 경쟁력 있는 AP 값을 시연한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.