QUICK REVIEW

[論文レビュー] SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation

Junjie Zhou, Yongping Xiong|arXiv (Cornell University)|Jan 17, 2023

3D Surveying and Cultural Heritage被引用数 7

ひとこと要約

SATはサイズ認識を組み込んだ3D点群セグメンテーションを実現する。多粒度の注意と再アテンションを組み合わせ、S3DISとScanNetV2で最先端の結果を達成する。

ABSTRACT

Transformer models have achieved promising performances in point cloud segmentation. However, most existing attention schemes provide the same feature learning paradigm for all points equally and overlook the enormous difference in size among scene objects. In this paper, we propose the Size-Aware Transformer (SAT) that can tailor effective receptive fields for objects of different sizes. Our SAT achieves size-aware learning via two steps: introduce multi-scale features to each attention layer and allow each point to choose its attentive fields adaptively. It contains two key designs: the Multi-Granularity Attention (MGA) scheme and the Re-Attention module. The MGA addresses two challenges: efficiently aggregating tokens from distant areas and preserving multi-scale features within one attention layer. Specifically, point-voxel cross attention is proposed to address the first challenge, and the shunted strategy based on the standard multi-head self attention is applied to solve the second. The Re-Attention module dynamically adjusts the attention scores to the fine- and coarse-grained features output by MGA for each point. Extensive experimental results demonstrate that SAT achieves state-of-the-art performances on S3DIS and ScanNetV2 datasets. Our SAT also achieves the most balanced performance on categories among all referred methods, which illustrates the superiority of modelling categories of different sizes. Our code and model will be released after the acceptance of this paper.

研究の動機と目的

varying sizes のオブジェクトを含む3D点群のセマンティックセグメンテーションの動機付け。
マルチスケール、サイズ認識特徴を学習するトランスフォーマーブロックの開発。
オブジェクトサイズに基づく点の受容野を適応的に調整。
devoxelization ロスを伴わず、微細と粗大両方の特徴を保持。
困難な屋内データセットで最先端性能を示す。

提案手法

各アテンション層内で細粒度と粗粒度の特徴を生み出す Multi-Granularity Attention (MGA) の導入。
Point-Voxel Cross Attention (PVCA) を実装し、ポイントトークンとボクセルトークン間のアテンションを直接計算。
MGA におけるマルチスケール特徴を分離するためのポイント-ボクセル分流戦略を使用。
オブジェクトサイズに基づいてアテンションヘッドを動的に重み付けする Re-Attention モジュールを追加。
サイズ認識トランスフォーマー (SAT) をエンドツーエンドのセグメンテーションのために積み重ねて構築。
ウィンドウベースの自己注意とマルチスケール受容野のアーキテクチャ詳細と階層的ステージを提供。

実験結果

リサーチクエスチョン

RQ1 サイズ認識学習は3D点群の異なるサイズのオブジェクト間でセグメンテーション精度を向上させるか？
RQ2 MGAとPVCA は devoxelization ロスなしで効果的なマルチスケール特徴統合を可能にするか？
RQ3 Re-Attention モジュールは推論時にオブジェクトサイズに応じてアテンションを適切に調整するか？
RQ4 SAT は標準的な屋内ベンチマーク（S3DIS, ScanNetV2）で従来手法と比較してどうか？

主な発見

Methods	mIoU (%)	mAcc (%)	Ceil.	Floor	Wall	Beam	Col.	Wind.	Door	Table	Chair	Sofa	Book.	Board	Clut.
PointNet	41.1	66.2	88.8	97.3	69.8	1.0	3.9	46.3	10.8	59.0	52.6	5.9	40.3	26.4	33.2
RSNet	51.9	59.4	93.3	98.3	79.2	0.0	15.8	45.4	50.1	67.9	65.5	52.5	22.5	41.0	43.6
PointCNN	57.3	63.9	92.3	98.2	79.4	0.0	17.6	22.8	62.1	74.4	80.6	31.7	66.7	62.1	56.7
SPGraph	58.0	66.5	89.4	96.9	78.1	0.0	42.8	48.9	61.6	84.7	75.4	69.8	52.6	2.1	52.2
PCCN	58.3	67.0	92.3	96.2	75.9	3.0	6.0	69.5	63.5	66.9	65.6	47.3	68.9	59.1	46.2
PointWeb	60.3	66.6	92.0	98.5	79.4	0.0	21.1	59.7	34.8	76.3	88.3	46.9	69.3	64.9	52.5
MinkowsikiNet	65.4	71.7	91.8	98.7	86.2	0.0	34.1	48.9	62.4	81.6	89.8	47.2	74.9	74.4	58.6
KPConv	67.1	72.8	92.8	97.3	82.4	0.0	23.9	58.0	69.0	81.5	91.0	75.4	75.3	66.7	58.9
ASSANet-L	66.8	-	-	-	-	-	-	-	-	-	-	-	-	-	-
RepSurf-U	68.9	76.0	-	-	-	-	-	-	-	-	-	-	-	-	-
CBL	69.4	75.2	93.9	98.4	84.2	0.0	37.0	57.7	71.9	91.7	81.8	77.8	75.6	69.1	62.9
PatchFormer	68.1	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Fast PT.	70.1	77.4	-	-	-	-	-	-	-	-	-	-	-	-	-
Point Transformer	70.4	76.5	94.0	98.5	86.3	0.0	38.0	63.4	74.3	82.4	89.1	80.2	74.3	76.0	59.3
PointNeXt-XL	70.8	77.5	94.2	98.5	84.4	0.0	37.7	59.3	74.0	83.1	91.6	77.4	76.7	78.8	60.6
SAT	72.6	78.8	93.6	98.5	87.2	0.0	49.3	61.1	73.6	83.7	91.8	81.7	77.9	82.3	63.4

SAT は S3DIS Area 5 での最先端の mIoU および mAcc を達成し、カテゴリー間の性能バランスが取れている。
SAT は ScanNetV2 で val mIoU 74.4%、test mIoU 74.2% を達成し、従来手法を上回る。
アブレーションにより Re-Attention と MGA の寄与が性能向上に不可欠で、特に小クラスに対して重要であることが示された。
PVCA ベースの MGA は特徴の devoxelization ロスなしでより大きな受容野を実現。
モデルは参照手法の中で最もバランスの取れたカテゴリ性能（IoU の分散が最も小さい）を S3DIS で示した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。