QUICK REVIEW

[论文解读] PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction

Sicheng Zuo, Wenzhao Zheng|arXiv (Cornell University)|Aug 31, 2023

3D Shape Modeling and Analysis被引用 8

一句话总结

PointOcc 引入 Cylindrical Tri-Perspective View (Cylindrical TPV) 并以 2D 图像骨干网络处理 LiDAR 点云以预测稠密的 3D 语义占据，在仅使用 LiDAR 的情况下达到最先进的结果且速度更快。

ABSTRACT

Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark. Code: https://github.com/wzzheng/PointOcc.

研究动机与目标

推动稠密的 3D 语义占据预测，以克服 2D 投影导致的信息丢失。
提出 Cylindrical TPV，使其更好地与 LiDAR 点密度分布对齐。
通过使用 2D 骨干网络和共享 TPV 编码–解码实现高效处理。
提供一个无需后处理的框架，能够产生高分辨率的 3D 占据和 LiDAR 分割结果。

提出的方法

通过圆柱划分和空间分组池化将 LiDAR 点转换为 Cylindrical TPV 平面，以保留三维结构。
使用共享的 2D 骨干网络和 FPN 对每个 TPV 平面进行编码，以提取多尺度特征。
通过将点/体素投影到三个 TPV 平面并对插值特征求和，查询每个点的特征。
使用一个简单的两层 MLP 头来实现语义占据/分割，无需后处理。

实验结果

研究问题

RQ1如何在不使用重度 3D 卷积的情况下，有效地表示 LiDAR 点云以进行稠密的 3D 语义占据预测？
RQ2Cylindrical TPV 是否比 Cartesian TPV 或单视图投影更好地捕捉近场细节和整体 3D 结构？
RQ3在将 TPV 特征输入后，利用在图像上预训练的 2D 骨干网络是否能高效地输出高质量的 3D 语义预测？
RQ4TPV 分辨率、分组大小与计算成本相比精度的权衡如何？
RQ5与体素基方法及其他基于 2D 投影的方法相比，PointOcc 在 OpenOccupancy 和 LiDAR 分割基准上的表现如何？

主要发现

PointOcc 使用 LiDAR 独立在 OpenOccupancy 上达到最先进的性能，超越多模态方法，验证集中的 mIoU 23.9、IoU 34.1 的显著提升。
在 nuScenes LiDAR 分割任务中，PointOcc 超越所有基于 2D 投影的方法，并且与基于体素的方法相比具有竞争力（例如使用 ImageNet-1K 预训练 ViT 骨干时的 mIoU 77.9）。
当组合三个 TPV 平面（HW、WD、DH）时获得的结果最佳，表明各平面信息存在互补。
较高的 TPV 分辨率能带来更好性能，而空间分组池化（K=16）在可控成本下保留了结构细节。
使用在 ImageNet-1K/21K 上预训练的 ViT 骨干可提升性能，且对 LiDAR 分割时冻结部分 ViT 权重能在保持高精度的同时提高稳定性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。