QUICK REVIEW

[论文解读] SF-Mamba: Rethinking State Space Model for Vision

Masakazu Yoshimura, Takashi Hayashi|arXiv (Cornell University)|Mar 17, 2026

Advanced Neural Network Applications被引用 0

一句话总结

SF-Mamba 引入辅助令牌交换和带周期性状态重置的批量折叠，以实现视觉 Mamba 的高效单向扫描，在分类、检测和分割任务中实现更好的准确率–吞吐量。

ABSTRACT

The realm of Mamba for vision has been advanced in recent years to strike for the alternatives of Vision Transformers (ViTs) that suffer from the quadratic complexity. While the recurrent scanning mechanism of Mamba offers computational efficiency, it inherently limits non-causal interactions between image patches. Prior works have attempted to address this limitation through various multi-scan strategies; however, these approaches suffer from inefficiencies due to suboptimal scan designs and frequent data rearrangement. Moreover, Mamba exhibits relatively slow computational speed under short token lengths, commonly used in visual tasks. In pursuit of a truly efficient vision encoder, we rethink the scan operation for vision and the computational efficiency of Mamba. To this end, we propose SF-Mamba, a novel visual Mamba with two key proposals: auxiliary patch swapping for encoding bidirectional information flow under an unidirectional scan and batch folding with periodic state reset for advanced GPU parallelism. Extensive experiments on image classification, object detection, and instance and semantic segmentation consistently demonstrate that our proposed SF-Mamba significantly outperforms state-of-the-art baselines while improving throughput across different model sizes. We will release the source code after publication.

研究动机与目标

通过解决现有视觉 Mamba 模型的因果性与速度限制，来激励高效的视觉编码器。
开发一个最小开销的单向扫描，支持未来到过去的信息流。
通过批量折叠和周期性状态重置，提升短序列视觉任务的 GPU 并行性。
在图像分类、目标检测与语义/实例分割方面证明 SF-Mamba 的有效性。

提出的方法

提出辅助补丁交换，以在单向扫描中使用两个辅助令牌和一个轻量级、无参数的交换操作实现双向信息流。
引入带周期性状态重置的批量折叠，通过在 T 步进行受控状态重置来合并批次与序列维度，同时保持独立性，从而最大化 GPU 利用率。
利用带单向扫描和选择性 SSM 块的 MambaVision 混合架构，辅以用于未来到过去路由的辅助令牌。
提供支持批量折叠数据和边界处理的深度可分离一维卷积实现，以保持正确性。
通过 LUT 预计算自适应的 B1/B 比例，以在不同批量大小和序列长度下优化批量折叠。
在 ImageNet-1K 上进行分类，在 ADE20K 上结合 UperNet 进行分割，以及对象检测工作流的评估（如附录所述）。

Figure 1 : Top-1 accuracy and throughput on ImageNet-1K classification. SF-Mamba offers superior accuracy–throughput trade-offs compared to state-of-the-art architectures.

实验结果

研究问题

RQ1单向 Mamba 是否可以通过辅助令牌交换达到与双向扫描同等的表达能力？
RQ2带周期性状态重置的批量折叠是否能在短序列上显著提高 SF-Mamba 的速度而不牺牲准确性？
RQ3辅助令牌如何影响未来到过去的信息流与整体表征质量在视觉任务中的表现？
RQ4与最先进的 CNN/Transformer/混合骨干网络相比，SF-Mamba 的吞吐-准确性权衡如何？
RQ5相对于分类任务，SF-Mamba 在分割和检测任务上的表现如何？

主要发现

SF-Mamba 在不同模型规模（T/S/B）下相对于最先进基线，取得优越的准确率–吞吐量权衡。
使用批量折叠对 SSM 内核的加速在 110% 到 180% 的量级，尤其对短序列效果明显。
消融结果显示辅助令牌交换在对 IN1K 和 ADE20K 的性能提升中具有贡献，同时对速度影响有限。
与单向扫描基线相比，辅助令牌交换提供了双向信息流并提升了准确性。
与多种双向扫描设计相比，SF-Mamba 的单向扫描与交换在较低开销下实现了有竞争力的准确性。
SF-Mamba-S 与 SF-Mamba-T 变体在分类与 ADE20K 分割任务上，在帕累托高效区间显示出强劲表现。

Figure 2 : Future-to-Past Information Routing via Auxiliary Token Swapping. The left figure illustrates why the commonly used multi-directional scan in visual Mamba fails to achieve high speed, while the right figure presents our proposed solution. We prepend/append learnable auxiliary tokens to the

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。