QUICK REVIEW

[Paper Review] PointSeg: Real-Time Semantic Segmentation Based on 3D LiDAR Point Cloud

Yuan Wang, Tianyue Shi|arXiv (Cornell University)|Jul 17, 2018

Remote Sensing and LiDAR Applications25 references105 citations

TL;DR

PointSeg transforms 3D LiDAR point clouds into dense spherical images and uses a lightweight SqueezeNet-based network with Squeeze-and-Excitation inspired reweighting and dilated multi-scale features to achieve real-time road-object semantic segmentation on a single GPU.

ABSTRACT

In this paper, we propose PointSeg, a real-time end-to-end semantic segmentation method for road-objects based on spherical images. We take the spherical image, which is transformed from the 3D LiDAR point clouds, as input of the convolutional neural networks (CNNs) to predict the point-wise semantic map. To make PointSeg applicable on a mobile system, we build the model based on the light-weight network, SqueezeNet, with several improvements. It maintains a good balance between memory cost and prediction performance. Our model is trained on spherical images and label masks projected from the KITTI 3D object detection dataset. Experiments show that PointSeg can achieve competitive accuracy with 90fps on a single GPU 1080ti. which makes it quite compatible for autonomous driving applications.

Motivation & Objective

Motivate real-time 3D semantic segmentation for road objects using LiDAR data.
Develop a lightweight, accurate network based on SqueezeNet suitable for embedded and onboard systems.
Leverage spherical projection to convert sparse 3D point clouds into dense 2D representations for CNN processing.
Incorporate attention-like channel reweighting and multi-scale context to boost segmentation accuracy.

Proposed method

Transform LiDAR point clouds into dense 64x512x5 spherical images using azimuth and zenith projections.
Build PointSeg on a light-weight Fire-based backbone inspired by SqueezeNet and SqueezeSeg.
Introduce a squeeze reweighting layer to model channel dependencies akin to SE blocks.
Employ an enlargement (dilated) layer to capture multi-scale context without excessive downsampling.
Use a deconvolution-based upsampling path with skip connections to recover point-wise segmentation maps.
Apply random sample consensus (RANSAC) as a post-process to refine back-projected segmentation results.

Experimental results

Research questions

RQ1Can a lightweight CNN based on SqueezeNet achieve real-time 3D LiDAR semantic segmentation on a standard GPU?
RQ2Does transforming 3D LiDAR point clouds into spherical images enable effective per-pixel semantic labels with competitive accuracy?
RQ3Do channel-wise reweighting and multi-scale dilated context improve segmentation performance for road objects, including small ones like pedestrians?
RQ4What is the runtime performance and memory footprint of PointSeg on typical onboard hardware (e.g., GTX 1080Ti, Jetson TX2)?

Key findings

PointSeg achieves real-time performance around 90 fps on a single GPU for forward pass computations.
The proposed downsampling strategy (three downsampling steps) improves accuracy for pedestrians and cyclists without sacrificing cars.
Enlargement layer with dilated convolutions (rates 6, 9, 12) provides multi-scale context while keeping memory usage reasonable.
Squeeze reweighting layers (SR1–SR3) improve channel-wise feature robustness, especially for smaller objects, with reweight-down yielding the best balance.
RANSAC post-processing improves back-projected segmentation accuracy, yielding notable gains over the base method.
Compared to SqueezeSeg, PointSeg shows improved car and cyclist IoUs, with competitive pedestrian results, and requires around 12 ms per frame on a 1080Ti (without CRF) and ~98 ms on TX2 for PointSeg with RANSAC.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.