QUICK REVIEW

[論文レビュー] (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network

Ran Cheng, Ryan Razani|arXiv (Cornell University)|Feb 8, 2021

3D Shape Modeling and Analysis参考文献 39被引用数 27

ひとこと要約

この論文は (AF)²-S3Net を提案します。Attentive Feature Fusion (AF2M) と Adaptive Feature Selection (AFSM) モジュールを備えたエンドツーエンドの3D スパース CNN で、LiDAR セマンティックセグメンテーションを改善し、SemanticKITTI で最先端を達成し nuScenes-lidarseg への強い一般化を示します。

ABSTRACT

Autonomous robotic systems and self driving cars rely on accurate perception of their surroundings as the safety of the passengers and pedestrians is the top priority. Semantic segmentation is one the essential components of environmental perception that provides semantic information of the scene. Recently, several methods have been introduced for 3D LiDAR semantic segmentation. While, they can lead to improved performance, they are either afflicted by high computational complexity, therefore are inefficient, or lack fine details of smaller instances. To alleviate this problem, we propose AF2-S3Net, an end-to-end encoder-decoder CNN network for 3D LiDAR semantic segmentation. We present a novel multi-branch attentive feature fusion module in the encoder and a unique adaptive feature selection module with feature map re-weighting in the decoder. Our AF2-S3Net fuses the voxel based learning and point-based learning into a single framework to effectively process the large 3D scene. Our experimental results show that the proposed method outperforms the state-of-the-art approaches on the large-scale SemanticKITTI benchmark, ranking 1st on the competitive public leaderboard competition upon publication.

研究の動機と目的

自動運転システムのための疎な点群を用いた正確な3D LiDARセマンティックセグメンテーションの動機付け。
マルチスケール特徴を統合し細部を強調するデュアルアテンションアーキテクチャの導入。
大規模シーンに対してボクセルベースとポイントベースの学習を融合する統一的なボクセル-ポイント処理フレームワークを提供。
SemanticKITTI で最先端の性能を示し、nuScenes-lidarseg および ModelNet40 での一般化を検証。

提案手法

小・中・大カーネルに跨るマルチブランチ特徴を、点ベースとボクセルベースの文脈を表すように融合する AF2M を提案。
Decoder に AFSM を導入し、マルチスケール特徴を適応的に再加重・選択。
Residual backbone を備えた Minkowski Engine 上で end-to-end 3D sparse CNN を構築。
Exponential-Log, geo-aware anisotropic, Lovász ロスを重み 1, 1.5, 1.5 で組み合わせた複合損失を使用。
LiDAR フレームを座標 C と特徴 F (x,y,z, intensity, normals) のスパーステンソルとして per-point segmentation に使用。

実験結果

リサーチクエスチョン

RQ1AF2M は sparse 3D LiDAR データにおいて局所的な細部とグローバルな文脈の両方を捉えることができるか？
RQ2Adaptive Feature Selection Module はデコーダーでマルチスケール特徴を再加重することで一般化を改善するか？
RQ3(AF)²-S3Net は SemanticKITTI、nuScenes-lidarseg、ModelNet40 において最新手法と比べてどの程度の性能を示すか？
RQ4組み合わせ損失 (Exponential-Log, geo-aware、Lovász) がセグメンテーション精度に与える影響は？

主な発見

Method	Mean IoU	Car	Bicycle	Motorcycle	Truck	Other-vehicle	Person	Bicyclist	Motorcyclist	Road	Parking	Sidewalk	Other-ground	Building	Fence	Vegetation	Trunk	Terrain	Pole	Traffic-sign
(AF)²-S3Net [Ours]	69.7	94.5	65.4	86.8	39.2	41.1	80.7	80.4	74.3	91.3	68.8	72.5	53.5	87.9	63.2	70.2	68.5	53.7	61.5	71.0

(AF)²-S3Net は SemanticKITTI テストセットで平均 IoU が 69.7 の最先端を達成し、SPVNAS および MinkNet42 のベースラインを上回った。
本手法は小型オブジェクトクラスでの利益が顕著で、特に Bicycle (65.4)、Motorcycle (86.8)、Pole (61.5) といったクラスで高い改善を示す。
nuScenes-lidarseg の検証で、(AF)²-S3Net は Mean IoU 62.2、FW IoU 83.0 を達成し、MinkNet42 および SalsaNext のベースラインを上回った。
アブレーション研究は AF2M と AFSM を合わせると 68.6 mIoU となり、Lovász および geo-aware losses を加えると SemanticKITTI 検証で 74.2 mIoU に到達することを示した。
ModelNet40 の分類で AF2M を用いると総合精度 93.16% を達成し、点ベース手法の最先端と同等である。
定性的な結果は細部のキャプチャが改善されること（例：車両、植生）と距離に基づく性能の改善を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。