QUICK REVIEW

[论文解读] Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

Guowen Zhang, Lue Fan|arXiv (Cornell University)|Jun 15, 2024

3D Surveying and Cultural Heritage被引用 7

一句话总结

Voxel Mamba 引入一种无组别的基于体素的骨干网络，使用 State Space Models 将所有体素序列化为单一序列，采用 Dual-scale SSM Blocks 与 Implicit Window Partition 来保留空间邻近性并提高3D目标检测的效率。

ABSTRACT

Serialization-based methods, which serialize the 3D voxels and group them into multiple sequences before inputting to Transformers, have demonstrated their effectiveness in 3D object detection. However, serializing 3D voxels into 1D sequences will inevitably sacrifice the voxel spatial proximity. Such an issue is hard to be addressed by enlarging the group size with existing serialization-based methods due to the quadratic complexity of Transformers with feature sizes. Inspired by the recent advances of state space models (SSMs), we present a Voxel SSM, termed as Voxel Mamba, which employs a group-free strategy to serialize the whole space of voxels into a single sequence. The linear complexity of SSMs encourages our group-free design, alleviating the loss of spatial proximity of voxels. To further enhance the spatial proximity, we propose a Dual-scale SSM Block to establish a hierarchical structure, enabling a larger receptive field in the 1D serialization curve, as well as more complete local regions in 3D space. Moreover, we implicitly apply window partition under the group-free framework by positional encoding, which further enhances spatial proximity by encoding voxel positional information. Our experiments on Waymo Open Dataset and nuScenes dataset show that Voxel Mamba not only achieves higher accuracy than state-of-the-art methods, but also demonstrates significant advantages in computational efficiency.

研究动机与目标

通过避免体素分组，降低序列化为基础的3D检测中的邻近损失。
提出一个无组别的 Voxel SSM 骨干，处理所有体素作为单一序列。
通过 Dual-scale SSM 块和隐式位置编码来增强空间邻近性与感受野。
在 Waymo Open 和 nuScenes 数据集上展示最先进的准确性和效率。

提出的方法

使用 Hilbert 输入层将所有体素序列化为单一序列，以保留空间局部性。
用 Dual-scale SSM Block 建模体素交互，使用前向（高分辨率）和后向（下采样）分支以扩大有效感受野。
通过 Implicit Window Embedding 引入隐式窗口划分，以在无显式窗口的情况下编码3D位置信息。
采用与现有基于体素的检测器和 BEV 骨干兼容的无组 Backbone。
在 Waymo Open 数据集和 nuScenes 上进行训练与评估，以与最先进方法进行比较。

实验结果

研究问题

RQ1无组别的状态空间骨干是否能在基于体素的3D检测中超越基于分组的序列化方法？
RQ2Dual-scale SSM 块和隐式窗口嵌入是否提升序列化体素序列中的3D空间邻近性与感受野？
RQ3相较于先前的骨干，Voxel Mamba 在 Waymo 与 nuScenes 上的准确性与效率提升分别是多少？
RQ4基于 Hilbert 的体素排序对模型性能与内存使用有何影响？

主要发现

Voxel Mamba 在 Waymo 验证集上达到 79.6/73.4 的 L1/L2 mAPH，超越 DSVT-Voxel 基线。
在 Waymo 测试集上，Voxel Mamba 达到 79.6/74.3 的 L1/L2 mAPH，超越若干基于窗口和基于曲线的分组方法。
在 nuScenes 验证集上，Voxel Mamba 获得 71.9 NDS 和 67.5 mAP，超越之前的最佳水平 0.5 NDS 和 0.8 mAP。
在 nuScenes 测试集上，Voxel Mamba 实现 73.0 NDS 和 69.0 mAP，在若干指标上领先于同期检测器。
Voxel Mamba 的内存消耗低于基于分组的变换器，同时在某些基线方法上提供更高的准确性和更快的推断速度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。