QUICK REVIEW

[论文解读] Vehicle Detection from 3D Lidar Using Fully Convolutional Network

Bo Li, Tianlei Zhang|arXiv (Cornell University)|Aug 29, 2016

Advanced Optical Sensing Technologies参考文献 21被引用 275

一句话总结

本文提出一个端到端的二维FCN，将 Velodyne 64E range scans 投影到二维地图，预测对象性与车辆的三维边界框，在 KITTI 基于距离扫描的检测上达到最新的 SOTA。

ABSTRACT

Convolutional network techniques have recently achieved great success in vision based detection tasks. This paper introduces the recent development of our research on transplanting the fully convolutional network technique to the detection tasks on 3D range scan data. Specifically, the scenario is set as the vehicle detection task from the range data of Velodyne 64E lidar. We proposes to present the data in a 2D point map and use a single 2D end-to-end fully convolutional network to predict the objectness confidence and the bounding boxes simultaneously. By carefully design the bounding box encoding, it is able to predict full 3D bounding boxes even using a 2D convolutional network. Experiments on the KITTI dataset shows the state-of-the-art performance of the proposed method.

研究动机与目标

Motivate applying fully convolutional networks to 3D LiDAR range scans for vehicle detection.
Project 3D LiDAR points to a 2D point map to enable end-to-end 2D FCN processing.
Simultaneously predict objectness and full 3D bounding boxes within a unified framework.
Design a rotation-invariant bounding box encoding to handle viewpoint changes.
Achieve competitive or state-of-the-art performance on KITTI using end-to-end learning.

提出的方法

Convert Velodyne 64E LiDAR points into a 2D point map with channels (d, z).
Use a 2D fully convolutional network with a shared trunk and two heads: objectness classification and 24D bounding box regression.
Encode 3D bounding boxes by transforming corner coordinates with a rotation R and reporting 8 corners per object.
Concatenate feature maps from multiple layers to improve small object and edge prediction.
Apply data augmentation via geometry-preserving 3D transforms and multi-task loss balancing for objectness and box regression.
Train with weighted losses to balance foreground/background and varying object sizes/distances.

实验结果

研究问题

RQ1Can a fully convolutional network operate directly on 2D projections of 3D LiDAR range scans to detect vehicles?
RQ2How can 3D bounding boxes be encoded so that a 2D convolutional backbone can predict full 3D boxes?
RQ3Does joint objectness and 3D box regression improve detection in LiDAR data compared to traditional proposal-based methods?
RQ4What dataset and evaluation protocol (KITTI) can demonstrate state-of-the-art performance for LiDAR-based vehicle detection?
RQ5How do data augmentation and loss balancing affect training on sparse LiDAR point clouds?

主要发现

Difficulty	Image Space AP	Image Space AOS	World Space AP	World Space AOS
简单	74.1%	73.9%	77.3%	77.2%
中等	71.0%	70.9%	72.4%	72.3%
困难	70.0%	69.9%	69.4%	69.4%

The proposed FCN achieves high detection performance on KITTI range-scan data, with state-of-the-art offline world-space AP/AOS in Easy settings.
Offline world-space AP: Easy 77.3%, Moderate 72.4%, Hard 69.4%; Offline world-space AOS: Easy 77.2%, Moderate 72.3%, Hard 69.4%.
Image-space AP/AOS are slightly lower than world-space, reflecting differences between 2D projection overlap and 3D localization.
The method can predict complete 3D bounding boxes even for vehicles partly visible, aiding tracking and planning.
Compared to prior range-scan methods, the approach shows improved AP in Easy and competitive AP in Moderate/Hard tasks; it also achieves superior orientation estimation (AOS).
The network benefits from multi-layer feature concatenation to improve small object and edge predictions.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。