QUICK REVIEW

[论文解读] Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes

Haiyan Wang, Xuejian Rong|arXiv (Cornell University)|Apr 26, 2020

3D Shape Modeling and Analysis被引用 5

一句话总结

该论文提出了一种仅使用2D监督的弱监督3D语义分割框架，用于在野外场景的大规模点云上进行训练。通过结合基于图的金字塔特征网络与可观测性网络，并采用2D-3D联合优化及透视渲染技术，该方法在SUNCG和S3DIS数据集上实现了与全监督方法相当的最先进性能。

ABSTRACT

The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation, especially for scenes in the wild with varieties of different objects. To alleviate this issue, we propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision. Different with numerous preceding multi-view supervised approaches focusing on single object point clouds, we argue that 2D supervision is capable of providing sufficient guidance information for training 3D semantic segmentation models of natural scene point clouds while not explicitly capturing their inherent structures, even with only single view per training sample. Specifically, a Graph-based Pyramid Feature Network (GPFN) is designed to implicitly infer both global and local features of point sets and an Observability Network (OBSNet) is introduced to further solve object occlusion problem caused by complicated spatial relations of objects in 3D scenes. During the projection process, perspective rendering and semantic fusion modules are proposed to provide refined 2D supervision signals for training along with a 2D-3D joint optimization strategy. Extensive experimental results demonstrate the effectiveness of our 2D supervised framework, which achieves comparable results with the state-of-the-art approaches trained with full 3D labels, for semantic point cloud segmentation on the popular SUNCG synthetic dataset and S3DIS real-world dataset.

研究动机与目标

解决复杂真实场景中3D语义分割标注稀缺的问题。
仅使用2D图像级监督实现有效的3D语义分割。
在无需显式3D监督的情况下隐式建模3D点云结构。
缓解3D场景中物体遮挡和复杂空间关系带来的挑战。
开发一种可扩展的框架，用于自然场景点云的大规模语义分割。

提出的方法

设计了一种基于图的金字塔特征网络（GPFN），用于从3D点云中提取分层的全局与局部特征。
引入了一种可观测性网络（OBSNet），用于建模可见性并处理3D场景中的物体遮挡问题。
通过透视渲染和语义融合模块，从3D点云生成优化后的2D监督信号。
采用2D-3D联合优化策略，在训练过程中对齐2D监督与3D预测结果。
该框架利用单视角2D标注来指导3D分割，无需依赖3D边界框或实例级标签。
通过点云拓扑上的图卷积操作，隐式捕捉3D几何结构。

实验结果

研究问题

RQ1仅使用2D监督是否足以在复杂真实场景中提供准确的3D语义分割指导？
RQ2如何在无显式3D标注的情况下有效建模3D点云结构？
RQ3通过2D监督与可见性建模，能在多大程度上缓解3D场景中的物体遮挡问题？
RQ4与纯2D或纯3D监督训练相比，联合2D-3D优化策略是否能提升分割性能？
RQ5在准确性和可扩展性方面，该方法与全监督的最先进方法相比表现如何？

主要发现

所提方法在SUNCG合成数据集上的性能与最先进的全监督方法相当。
在S3DIS真实世界数据集上，尽管仅使用2D监督，该模型仍达到了全监督SOTA方法的性能水平。
基于图的金字塔特征网络通过图卷积操作，有效捕捉了3D点云中的局部与全局上下文信息。
可观测性网络通过建模可见性与空间关系，显著提升了遮挡区域的分割准确率。
2D-3D联合优化策略增强了特征对齐，提升了在多样化场景布局下的泛化能力。
该框架在自然场景的大规模复杂点云上展现出强大的可扩展性与鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。