QUICK REVIEW

[논문 리뷰] Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

Guowen Zhang, Lue Fan|arXiv (Cornell University)|2024. 06. 15.

3D Surveying and Cultural Heritage인용 수 7

한 줄 요약

Voxel Mamba는 그룹 없이 voxel 기반 백본을 사용하여 State Space Models로 모든 복셀을 단일 시퀀스로 직렬화하고, Dual-scale SSM Blocks 및 Implicit Window Partition으로 공간 근접성과 3D 객체 탐지의 효율성을 보존하고 향상합니다.

ABSTRACT

Serialization-based methods, which serialize the 3D voxels and group them into multiple sequences before inputting to Transformers, have demonstrated their effectiveness in 3D object detection. However, serializing 3D voxels into 1D sequences will inevitably sacrifice the voxel spatial proximity. Such an issue is hard to be addressed by enlarging the group size with existing serialization-based methods due to the quadratic complexity of Transformers with feature sizes. Inspired by the recent advances of state space models (SSMs), we present a Voxel SSM, termed as Voxel Mamba, which employs a group-free strategy to serialize the whole space of voxels into a single sequence. The linear complexity of SSMs encourages our group-free design, alleviating the loss of spatial proximity of voxels. To further enhance the spatial proximity, we propose a Dual-scale SSM Block to establish a hierarchical structure, enabling a larger receptive field in the 1D serialization curve, as well as more complete local regions in 3D space. Moreover, we implicitly apply window partition under the group-free framework by positional encoding, which further enhances spatial proximity by encoding voxel positional information. Our experiments on Waymo Open Dataset and nuScenes dataset show that Voxel Mamba not only achieves higher accuracy than state-of-the-art methods, but also demonstrates significant advantages in computational efficiency.

연구 동기 및 목표

직렬화 기반 3D 검출기에서 그룹화 없이 근접 손실을 줄이는 것을 목표로 한다.
모든 복셀을 단일 시퀀스로 처리하는 그룹 없는 Voxel SSM 백본을 제안한다.
Dual-scale SSM 블록과 암시적 위치 인코딩을 통해 공간 근접성 및 수용 영역을 향상시킨다.
Waymo Open 및 nuScenes 데이터세트에서 최첨단 정확도와 효율성을 입증한다.

제안 방법

모든 복셀을 힐베르트 입력 계층을 사용해 단일 시퀀스로 직렬화하여 공간 지역성을 보존한다.
포워드(고해상도) 및 백워드(다운샘플링) 가지를 사용하는 Dual-scale SSM Block으로 voxel 간 상호 작용을 모델링하고 유효 수용 영역을 확장한다.
Implicit Window Embedding을 통해 명시적 윈도잉 없이 3D 위치 정보를 인코딩하는 Implicit Window Partition를 도입한다.
그룹 없는 백본을 기존의 voxel 기반 탐지기 및 BEV 백본과 호환되도록 채용한다.
Waymo Open Dataset 및 nuScenes에서 학습 및 평가를 수행하여 최신 방법들과 비교한다.

실험 결과

연구 질문

RQ1그룹 없는 상태 공간 백본이 voxel 기반 3D 탐지에서 그룹화 기반 직렬화 방법보다 성능을 낼 수 있는가?
RQ2Dual-scale SSM Blocks 및 Implicit Window Embedding이 직렬화된 복셀 시퀀스에서 3D 공간 근접성과 수용 영역을 개선하는가?
RQ3Waymo와 nuScenes에서 Voxel Mamba의 정확도와 효율성 향상이 기존 백본에 비해 얼마나 되는가?
RQ4힐베르트 기반 복셀 정렬이 모델 성능 및 메모리 사용에 어떤 영향을 미치는가?

주요 결과

Voxel Mamba는 Waymo 검증 세트에서 79.6/73.4 L1/L2 mAPH를 달성하며 DSVT-Voxel 기준선을 능가한다.
Waymo 테스트 세트에서 Voxel Mamba는 79.6/74.3 L1/L2 mAPH에 도달하여 여러 윈도 기반 및 커브 기반 그룹화 방법을 능가한다.
nuScenes 검증에서 Voxel Mamba는 71.9 NDS와 67.5 mAP를 달성하여 이전 최고치를 0.5 NDS 및 0.8 mAP로 상회한다.
nuScenes 테스트에서 Voxel Mamba는 73.0 NDS와 69.0 mAP를 달성하며 동시대 탐지기들에 비해 여러 지표에서 선두를 차지한다.
Voxel Mamba는 그룹 기반 트랜스포머보다 메모리 사용이 적으면서도 더 높은 정확도와 더 빠른 추론 속도를 일부 baselines보다 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.