QUICK REVIEW

[论文解读] GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

Haibo Qiu, Baosheng Yu|arXiv (Cornell University)|Jul 6, 2022

3D Surveying and Cultural Heritage被引用 25

一句话总结

GFNet 学习在 range-view 与 BEV 投影之间的双向几何流，将多视图特征融合作为以提升 3D 点云语义分割，在 SemanticKITTI 与 nuScenes 上在基于投影的模型中达到最先进的结果。

ABSTRACT

Point cloud semantic segmentation from projected views, such as range-view (RV) and bird's-eye-view (BEV), has been intensively investigated. Different views capture different information of point clouds and thus are complementary to each other. However, recent projection-based methods for point cloud semantic segmentation usually utilize a vanilla late fusion strategy for the predictions of different views, failing to explore the complementary information from a geometric perspective during the representation learning. In this paper, we introduce a geometric flow network (GFNet) to explore the geometric correspondence between different views in an align-before-fuse manner. Specifically, we devise a novel geometric flow module (GFM) to bidirectionally align and propagate the complementary information across different views according to geometric relationships under the end-to-end learning scheme. We perform extensive experiments on two widely used benchmark datasets, SemanticKITTI and nuScenes, to demonstrate the effectiveness of our GFNet for project-based point cloud semantic segmentation. Concretely, GFNet not only significantly boosts the performance of each individual view but also achieves state-of-the-art results over all existing projection-based models. Code is available at \url{https://github.com/haibo-qiu/GFNet}.

研究动机与目标

通过利用 RV 与 BEV 之间的几何对应关系来提升基于投影的点云分割，而非依赖简单的后期融合，来提供动机。
提出具几何流模块的 GFNet，在端到端框架内实现 RV 与 BEV 之间信息的双向传播。
通过在双分支 RV/BEV 架构中将 KNN 后处理替换为 KPConv，从而实现端到端训练。
在大型基准 SemanticKITTI 和 nuScenes 上证明有效性，并在基于投影的模型中达到最先进的结果。

提出的方法

双分支网络架构使用编码器-解码器骨干处理 RV 和 BEV 输入。
Geometric Flow Module (GFM) 使用视图之间的变换在 RV 和 BEV 之间执行几何对齐。
GFM 包含一个注意力融合步骤，通过自注意力和残差连接将对齐的特征与目标特征结合。
几何对齐使用原始点云作为桥梁来计算跨视图变换矩阵。
在 GFNet 基础上使用 KPConv 以替代 KNN，实现端到端可训练性。
损失函数结合 2D 与 3D 监督，包含 Lovasz-Softmax 与交叉熵项，以端到端训练所有组件。

实验结果

研究问题

RQ1是否可以利用 RV 与 BEV 之间的几何对应来改进跨视图信息传播以进行点云分割？
RQ2相较于普通的晚期融合，RV 与 BEV 之间的双向几何流是否能改进各自视图的表示和整体融合？
RQ3GFM 中基于注意力的融合对分割性能有何影响？
RQ4与现有基于投影的方法相比，GFNet 在大型基准 SemanticKITTI 和 nuScenes 上的表现如何？
RQ5使用 KPConv 的端到端训练对于多视图基于投影的分割是否有效？

主要发现

方法	car	bicycle	motorcycle	truck	other-vehicle	person	bicyclist	road	parking	sidewalk	other-ground	building	fence	vegetation	trunk	terrain	pole	traffic-sign	mIoU
RV-Single	93.7	48.7	57.7	32.4	40.5	69.2	79.9	95.9	53.4	83.9	0.1	89.2	59.0	87.8	66.1	75.3	64.0	45.2	60.1
RV-Flow	93.8	45.0	58.8	69.9	31.6	63.6	73.8	95.6	52.9	83.6	0.3	90.3	62.1	88.0	64.3	75.8	63.2	47.4	61.1
BEV-Single	93.6	29.9	42.4	64.8	26.8	48.1	74.0	94.0	45.9	80.7	1.4	89.2	46.5	86.9	61.4	74.9	56.8	41.6	55.7
BEV-Flow	93.7	43.7	61.2	74.0	31.0	61.6	80.6	95.3	53.1	82.8	0.2	90.8	61.4	88.0	63.1	75.6	58.9	43.1	61.0
GFNet	94.2	49.7	63.2	74.9	32.1	69.3	83.2	95.7	53.8	83.8	0.2	91.2	62.9	88.5	66.1	76.2	64.1	48.3	63.0

GFNet 在 SemanticKITTI 验证集上相对于所有对比的基于投影的模型实现了更高的 mIoU。
将 GFM 纳入后，RV-Single 和 BEV-Single 分支的性能都显著提升，在允许视图间流动时收益尤为显著。
RV-Flow 与 BEV-Flow 展现出强劲的跨视图提升，与 KPConv 的拼接可获得最佳 GFNet 表现。
GFM 中的注意力（softmax）相对 sigmoid 提供边际提升并提升融合效果。
消融实验表明联合使用 2D 与 3D 监督（λ 配置）能获得最佳结果，端到端优化提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。