QUICK REVIEW

[論文レビュー] GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

Haibo Qiu, Baosheng Yu|arXiv (Cornell University)|Jul 6, 2022

3D Surveying and Cultural Heritage被引用数 25

ひとこと要約

GFNet は range-view と BEV 投影の間で双方向の幾何学的フローを学習し、マルチビュー特徴を融合して3D点群意味セグメンテーションを改善、投影ベースモデルの SemanticKITTI および nuScenes で最先端の結果を達成。

ABSTRACT

Point cloud semantic segmentation from projected views, such as range-view (RV) and bird's-eye-view (BEV), has been intensively investigated. Different views capture different information of point clouds and thus are complementary to each other. However, recent projection-based methods for point cloud semantic segmentation usually utilize a vanilla late fusion strategy for the predictions of different views, failing to explore the complementary information from a geometric perspective during the representation learning. In this paper, we introduce a geometric flow network (GFNet) to explore the geometric correspondence between different views in an align-before-fuse manner. Specifically, we devise a novel geometric flow module (GFM) to bidirectionally align and propagate the complementary information across different views according to geometric relationships under the end-to-end learning scheme. We perform extensive experiments on two widely used benchmark datasets, SemanticKITTI and nuScenes, to demonstrate the effectiveness of our GFNet for project-based point cloud semantic segmentation. Concretely, GFNet not only significantly boosts the performance of each individual view but also achieves state-of-the-art results over all existing projection-based models. Code is available at \url{https://github.com/haibo-qiu/GFNet}.

研究の動機と目的

RVとBEVの間の幾何学的対応を利用することによって、vanillaな後置融合に頼るのではなく、投影ベースの点群セグメンテーションの改善を動機づける。
RVとBEVの間で情報を双方向伝搬する幾何学的フローモジュールを備えたGFNetを、エンドツーエンドのフレームワークで提案する。
RV/BEV二branch アーキテクチャでKNNポスト処理をKPConvに置換して、エンドツーエンド訓練を可能にする。
大規模ベンチマーク SemanticKITTI および nuScenes での有効性を示し、投影ベースモデルの中で最先端の結果を達成する。

提案手法

二分岐ネットワークアーキテクチャは、エンコーダ-デコーダバックボーンを用いてRVとBEVの入力を処理する。
Geometric Flow Module (GFM) は、RVとBEV間のビュー間変換を用いて幾何学的整列を実行する。
GFM には、自己注意と残差接続を介して整列済み特徴とターゲット特徴を結合するアテンション融合ステップが含まれる。
幾何学的整列は、元の点群を橋渡しとして使用し、ビュー間の変換行列を計算する。
GFNet の上に KPConv を用いて KNN を置換し、エンドツーエンドのトレーニングを可能にする。
損失は Lovasz-Softmax とクロスエントロピー項を用いて 2D および 3D の監督を結合し、すべての部品をエンドツーエンドで訓練する。

実験結果

リサーチクエスチョン

RQ1RVとBEV間の幾何学的対応を活用して、点群セグメンテーションのための横断ビュー情報伝播を改善できるか？
RQ2RVとBEV間の双方向幾何フローは、 vanilla late fusion と比較して各ビューの表現と全体的な融合を改善しますか？
RQ3GFM における注意機構ベースの融合がセグメンテーション性能に与える影響は何ですか？
RQ4GFNet は大規模ベンチマーク SemanticKITTI および nuScenes において、既存の投影ベース手法と比較してどうですか？
RQ5KPConv を用いたエンドツーエンド訓練は、多視点投影ベースのセグメンテーションに効果的ですか？

主な発見

方法	car	bicycle	motorcycle	truck	other-vehicle	person	bicyclist	road	parking	sidewalk	other-ground	building	fence	vegetation	trunk	terrain	pole	traffic-sign	mIoU
RV-Single	93.7	48.7	57.7	32.4	40.5	69.2	79.9	95.9	53.4	83.9	0.1	89.2	59.0	87.8	66.1	75.3	64.0	45.2	60.1
RV-Flow	93.8	45.0	58.8	69.9	31.6	63.6	73.8	95.6	52.9	83.6	0.3	90.3	62.1	88.0	64.3	75.8	63.2	47.4	61.1
BEV-Single	93.6	29.9	42.4	64.8	26.8	48.1	74.0	94.0	45.9	80.7	1.4	89.2	46.5	86.9	61.4	74.9	56.8	41.6	55.7
BEV-Flow	93.7	43.7	61.2	74.0	31.0	61.6	80.6	95.3	53.1	82.8	0.2	90.8	61.4	88.0	63.1	75.6	58.9	43.1	61.0
GFNet	94.2	49.7	63.2	74.9	32.1	69.3	83.2	95.7	53.8	83.8	0.2	91.2	62.9	88.5	66.1	76.2	64.1	48.3	63.0

GFNet は SemanticKITTI の検証において、比較対象のすべての投影ベースモデルよりも mIoU を改善した。
RV-Single および BEV-Single の両方のブランチは GFM の組み込みから顕著な性能向上を得ており、ビュー間のフローを許すと大幅な利得が得られる。
RV-Flow と BEV-Flow は強力な横断ビュー改善を示し、KPConv との連結は GFNet の最高性能を生み出す。
Attention in GFM (softmax) provides marginal gains over sigmoid and improves fusion effectiveness.
Ablations show that jointly training with 2D and 3D supervision (λ configuration) yields best results, with end-to-end optimization enhancing performance.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。