[论文解读] SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation
SAT 通过将多粒度注意力与重注意力模块相结合,实现尺寸感知学习用于3D点云分割,在S3DIS和ScanNetV2上达到最先进的结果。
Transformer models have achieved promising performances in point cloud segmentation. However, most existing attention schemes provide the same feature learning paradigm for all points equally and overlook the enormous difference in size among scene objects. In this paper, we propose the Size-Aware Transformer (SAT) that can tailor effective receptive fields for objects of different sizes. Our SAT achieves size-aware learning via two steps: introduce multi-scale features to each attention layer and allow each point to choose its attentive fields adaptively. It contains two key designs: the Multi-Granularity Attention (MGA) scheme and the Re-Attention module. The MGA addresses two challenges: efficiently aggregating tokens from distant areas and preserving multi-scale features within one attention layer. Specifically, point-voxel cross attention is proposed to address the first challenge, and the shunted strategy based on the standard multi-head self attention is applied to solve the second. The Re-Attention module dynamically adjusts the attention scores to the fine- and coarse-grained features output by MGA for each point. Extensive experimental results demonstrate that SAT achieves state-of-the-art performances on S3DIS and ScanNetV2 datasets. Our SAT also achieves the most balanced performance on categories among all referred methods, which illustrates the superiority of modelling categories of different sizes. Our code and model will be released after the acceptance of this paper.
研究动机与目标
- 通过在不同尺寸的对象上进行语义分割来激发兴趣。
- 开发一个学习多尺度、尺寸感知特征的transformer模块。
- 使点基感受野根据对象尺寸进行自适应。
- 在不引入devoxel化损失的情况下保留细粒度与粗粒度特征。
- 在具有挑战性的室内数据集上展示最先进的性能。
提出的方法
- 提出Multi-Granularity Attention (MGA),在每个注意力层中产生细粒度与粗粒度特征。
- 实现Point-Voxel Cross Attention (PVCA)以直接在点令牌和体素令牌之间计算注意力。
- 使用点-体素分流策略来在MGA中解耦多尺度特征。
- 添加Re-Attention模块,基于对象尺寸动态加权注意力头。
- 堆叠SAT块形成Size-Aware Transformer (SAT)用于端到端分割。
- 提供架构细节,包括具有多尺度感受野的基于窗口的自注意力和分层阶段。
实验结果
研究问题
- RQ1尺寸感知学习是否能在3D点云中对不同尺寸的对象提升分割准确度?
- RQ2MGA与PVCA是否能够在不产生devoxelization损失的情况下实现有效的多尺度特征整合?
- RQ3Re-Attention模块在推理阶段是否能针对对象尺寸有效定制注意力?
- RQ4与以往方法相比,SAT在标准室内基准数据集(S3DIS、ScanNetV2)上的表现如何?
主要发现
- SAT在S3DIS Area 5上实现了最先进的mIoU与mAcc,并且各类别表现趋于平衡。
- SAT在ScanNetV2上实现74.4%的val mIoU和74.2%的test mIoU,超越先前方法。
- 消融实验表明Re-Attention与MGA的贡献对性能提升至关重要,特别是对小类别的影响。
- 基于PVCA的MGA实现了更大的感受野,同时避免了特征的devoxelization损失。
- 该模型在参考方法中对S3DIS的类别表现最为均衡(IoU方差最低)。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。