QUICK REVIEW

[论文解读] RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Qingyong Hu, Bo Yang|arXiv (Cornell University)|Nov 25, 2019

3D Shape Modeling and Analysis参考文献 68被引用 133

一句话总结

RandLA-Net 对大规模 3D 点云执行逐点语义分割，采用随机采样加上轻量级局部特征聚合，在 Semantic3D 与 SemanticKITTI 上达到最先进的结果，同时比现有方法更快且内存效率更高。

ABSTRACT

We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.

研究动机与目标

在无需大量预处理/后处理的情况下，为大规模、不规则的 3D 点云提供高效的语义分割动机。
证明在与强健的局部特征聚合器配对时，随机采样可以是有效的。
提出基于 LocSE 的局部空间编码和自适应聚合以在下采样过程中保留几何信息。
展示 RandLA-Net 在基准测试中实现显著更快速度和更低内存消耗，同时保持或超过以往准确性。

提出的方法

在单次传递中使用随机采样对大规模点云进行下采样，避免昂贵的 FPS/IDIS 基于方法。
引入 Local Spatial Encoding (LocSE) 单元，显式嵌入相对邻域几何信息。
应用带注意力的池化以自适应地加权并组合相邻特征。
将 LocSE 与注意力池化堆叠成膨胀残差块，逐步增大感受野。
使用轻量级共享 MLP 构建网络，避免图构建或体素化步骤。
端到端使用 Adam 在固定点子集（~1e5 点）上进行训练，并在全云上测试，无需预/后处理。

实验结果

研究问题

RQ1在没有大量预处理的情况下，随机采样是否能够实现对数百万点云的实时或近实时语义分割？
RQ2当随机采样下采样数据时，如何保留局部几何信息和特征？
RQ3LocSE 与注意力池化的膨胀残差结构是否能有效地增大大规模点云的感受野？
RQ4与 Semantic3D 和 SemanticKITTI 上的最先进方法相比，RandLA-Net 的效率与准确性权衡是什么？

主要发现

方法	总时间（s）	参数量（M）	最大输入点数（M）
PointNet (Vanilla)	192	0.8	0.49
PointNet++ (SSG)	9831	0.97	0.98
PointCNN	8142	11	0.05
SPG	43584	0.25	-
KPConv	717	14.9	0.54
RandLA-Net (Ours)	185	1.24	1.03

RandLA-Net 在单次传递中可处理多达 1 百万点，在大尺度点云上比现有方法快约 200×。
它在没有预/后处理的情况下直接处理大规模点云（如高达 1e6 点），在 Semantic3D 和 SemanticKITTI 上达到最先进的结果。
网络对 1e5 点云的演示耗时 0.04s，Sequence 08 (SemanticKITTI) 的 22 FPS 展示了强大的实时能力。
RandLA-Net 在 Semantic3D 上达到 77.4 mIoU，在 SemanticKITTI 上达到 53.9 mIoU（输入点数 50k），超越了许多基线，同时使用更少的参数。
消融实验表明去除 LocSE 或注意力模块会显著降低性能，证实局部几何编码与自适应特征加权的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。