[论文解读] LaSSM: Efficient Semantic-Spatial Query Decoding via Local Aggregation and State Space Models for 3D Instance Segmentation
LaSSM 提出分层语义-空间查询初始化与坐标引导的状态空间模型解码器,以降低 FLOPs 并实现高效、准确的 3D 实例分割,在 ScanNet++ V2 上达到 state-of-the-art。
Query-based 3D scene instance segmentation from point clouds has attained notable performance. However, existing methods suffer from the query initialization dilemma due to the sparse nature of point clouds and rely on computationally intensive attention mechanisms in query decoders. We accordingly introduce LaSSM, prioritizing simplicity and efficiency while maintaining competitive performance. Specifically, we propose a hierarchical semantic-spatial query initializer to derive the query set from superpoints by considering both semantic cues and spatial distribution, achieving comprehensive scene coverage and accelerated convergence. We further present a coordinate-guided state space model (SSM) decoder that progressively refines queries. The novel decoder features a local aggregation scheme that restricts the model to focus on geometrically coherent regions and a spatial dual-path SSM block to capture underlying dependencies within the query set by integrating associated coordinates information. Our design enables efficient instance prediction, avoiding the incorporation of noisy information and reducing redundant computation. LaSSM ranks first place on the latest ScanNet++ V2 leaderboard, outperforming the previous best method by 2.5% mAP with only 1/3 FLOPs, demonstrating its superiority in challenging large-scale scene instance segmentation. LaSSM also achieves competitive performance on ScanNet, ScanNet200, S3DIS and ScanNet++ V1 benchmarks with less computational cost. Extensive ablation studies and qualitative results validate the effectiveness of our design. The code and weights are available at https://github.com/RayYoh/LaSSM.
研究动机与目标
- 通过设计一个语义-空间初始化器来解决稀疏3D场景中的查询初始化问题,确保场景覆盖全面且快速收敛。
- 开发一个计算效率高、尽量减少冗余的查询解码器以细化查询。
- 在解码中引入位置信息,以在不执行昂贵注意力机制的情况下提升实例定位。
- 在大规模室内数据集上展示最先进的性能,同时降低计算成本。
提出的方法
- 引入一个分层的语义-空间查询初始化器,通过语义置信度和空间分布选取分数最高的超点,再通过 FPS 采样 q 条查询并投影到 Q 与 Qc 的嵌入。
- 实现一个坐标引导的状态空间模型(SSM)解码器,配备局部聚合模块,使交互仅限于几何上邻近的超点,从而降低复杂度。
- 使用空间双路径 SSM 块将查询沿希尔伯特曲线串行化以保留位置信息,并实现基于 SSM 的查询内通信,而无需全自注意力。
- 采用中心回归头在内容细化的同时细化查询坐标,并在解码层之间应用二分匹配损失(匈牙利算法)实现端到端集合预测。
- 通过带语义监督的超点与结合掩码、分类及中心项的多目标损失,以及标准语义交叉熵进行训练。
![Figure 1: Query distribution and performance comparison. (a) We compare query distributions of farthest point sampling (FPS) [ schult2023mask3d ] , semantic confidence-based selection (Semantic) [ he2023fastinst ] , and our method on different scenes. (b) Compared to SPFormer [ sun2023spformer ] , O](https://ar5iv.labs.arxiv.org/html/2602.11007/assets/x1.png)
实验结果
研究问题
- RQ1如何在稀疏3D点云中在不牺牲覆盖率的前提下实现自适应且高效的查询初始化?
- RQ2在保持位置信息的前提下,坐标引导且低复杂度的解码器能否有效细化实例查询?
- RQ3局部聚合与基于希尔伯特曲线的排序对3D实例分割性能与效率有何影响?
- RQ4将状态空间模型整合到3D查询解码中,是否较基于变换器的解码器在精度与 FLOPs 方面具有优势?
主要发现
- LaSSM 在 ScanNet++ V2 上达到 state-of-the-art,在排行榜中名列第一,较此前最好成绩在 mAP 提升 2.5%、AP50 提升 2.3%,且 FLOPs 只有原来的 1/3。
- 在 ScanNet V2、ScanNet200、S3DIS 与 ScanNet++ V1 上,LaSSM 在显著降低计算成本的同时提供了具有竞争力的结果。
- 分层初始化器通过优先考虑在语义上置信且在空间上分布均衡的超点,提升收敛速度,确保场景覆盖全面。
- 带有局部聚合和双路径 SSM 块的坐标引导解码器,使查询细化更加高效且无需对所有查询进行大规模跨注意力交互。
- 一种在前几层使用屏蔽跨注意力、后续使用局部聚合的混合变体,在准确性与效率之间取得平衡。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。