QUICK REVIEW

[論文レビュー] Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

Xinge Zhu, Hui Zhou|arXiv (Cornell University)|Nov 19, 2020

Advanced Neural Network Applications参考文献 53被引用数 34

ひとこと要約

この論文は Cylindrical Partition and Asymmetrical 3D Convolution Networks (CyAs) を outdoor LiDAR の意味セグメンテーションに適用し、 sparse で density が変化する outdoor 点群における 3D ジオメトリのモデリングを向上させ、SemanticKITTI と nuScenes で最先端の結果を達成、パノプティック分割と 3D 検出への一般化も良好である。

ABSTRACT

State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution. Although this corporation shows the competitiveness in the point cloud, it inevitably alters and abandons the 3D topology and geometric relations. A natural remedy is to utilize the3D voxelization and 3D convolution network. However, we found that in the outdoor point cloud, the improvement obtained in this way is quite limited. An important reason is the property of the outdoor point cloud, namely sparsity and varying density. Motivated by this investigation, we propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern while maintaining these inherent properties. Moreover, a point-wise refinement module is introduced to alleviate the interference of lossy voxel-based label encoding. We evaluate the proposed model on two large-scale datasets, i.e., SemanticKITTI and nuScenes. Our method achieves the 1st place in the leaderboard of SemanticKITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%. Furthermore, the proposed 3D framework also generalizes well to LiDAR panoptic segmentation and LiDAR 3D detection.

研究の動機と目的

Motivate outdoor LiDAR segmentation to preserve 3D geometry rather than relying on 2D projections.
Address sparsity and varying density in outdoor point clouds with a cylindrical partition strategy.
Enhance 3D feature learning using asymmetrical 3D convolutions tailored to driving-scene object shapes.
Mitigate information loss from voxel-based encoding with a point-wise refinement module.
Demonstrate strong generalization to LiDAR panoptic segmentation and 3D detection.

提案手法

Cylindrical partition that converts Cartesian coordinates to cylinder coordinates and assigns point-based MLP features to a 3D cylindrical grid, yielding a balanced, 3D representation (radius, azimuth, height).
Asymmetrical 3D convolution networks that emphasize horizontal and vertical kernels to match driving-scene object distributions, plus asymmetrical residual/downsample/upsample blocks.
Dimension-Decomposition based Context Modeling (DDCM) to construct high-rank global context from low-rank components.
Point-wise refinement module that fuses voxel-wise outputs with point-wise features to mitigate label-encoding loss from voxelization.
Joint voxel-wise and point-wise objective with weighted cross-entropy and Lovasz-Softmax for voxel output and weighted cross-entropy for point refinement.

実験結果

リサーチクエスチョン

RQ1Can cylindrical partition preserve 3D geometric structure and achieve balanced point distribution in outdoor LiDAR data?
RQ2Do asymmetrical horizontal/vertical kernels improve learning on driving-scene object shapes under sparse outdoor data?
RQ3Does a point-wise refinement step reduce information loss from voxel-based encoding and improve final segmentation quality?
RQ4How well does the CyAs framework generalize to LiDAR panoptic segmentation and 3D detection beyond semantic segmentation?
RQ5What is the impact of each component (cylindrical partition, asymmetrical CNNs, DDCM, and PR) on performance?

主な発見

Method	mIoU	car	bicycle	motorcycle	truck	other-vehicle	person	bicyclist	motorcyclist	road	parking	sidewalk	other-ground	building	fence	vegetation	trunk	terrain	pole	traffic
Ours	67.8	97.1	67.6	64.0	59.0	58.6	73.9	67.9	36.0	91.4	65.1	75.5	32.3	91.0	66.5	85.4	71.8	68.5	62.6	65.6

Achieves state-of-the-art mIoU on SemanticKITTI (Ours: 67.8) versus prior methods.
Outperforms projection-based and several 3D voxel-based methods on SemanticKITTI (e.g., projection methods gain 8–17% in mIoU).
On nuScenes validation, Ours achieves superior mIoU and per-class results, with notable gains for sparse classes such as bicycles and pedestrians.
Ablations show cylindrical partition and asymmetrical CNNs each contribute ~3% mIoU gains; DDCM adds ~1.4%; point-wise refinement adds ~0.7%.
Asymmetrical residual blocks that strengthen horizontal and vertical kernels yield up to ~3% mIoU improvement, with larger gains on truck, person, and motorcycle classes.
Panoptic segmentation and 3D detection experiments show CyAs improves PQ and mAP/NDS over baselines (e.g.,+4.7 to >5% PQ for panoptic; +~5–6% mAP/NDS for detection).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。