QUICK REVIEW

[论文解读] SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series

Badri N. Patro, Vijay Srinivas Agneeswaran|arXiv (Cornell University)|Mar 22, 2024

Currency Recognition and Detection被引用 27

一句话总结

SiMBA 将基于 Mamba 的序列建模与在傅里叶域中新颖的 EinFFT 通道混合结合起来，创建一个稳定、可扩展的状态空间架构，缩小与 Transformer 在视觉与多变量时间序列任务上的性能差距，并在 ImageNet 和时间序列任务上取得出色结果。

ABSTRACT

Transformers have widely adopted attention networks for sequence mixing and MLPs for channel mixing, playing a pivotal role in achieving breakthroughs across domains. However, recent literature highlights issues with attention networks, including low inductive bias and quadratic complexity concerning input sequence length. State Space Models (SSMs) like S4 and others (Hippo, Global Convolutions, liquid S4, LRU, Mega, and Mamba), have emerged to address the above issues to help handle longer sequence lengths. Mamba, while being the state-of-the-art SSM, has a stability issue when scaled to large networks for computer vision datasets. We propose SiMBA, a new architecture that introduces Einstein FFT (EinFFT) for channel modeling by specific eigenvalue computations and uses the Mamba block for sequence modeling. Extensive performance studies across image and time-series benchmarks demonstrate that SiMBA outperforms existing SSMs, bridging the performance gap with state-of-the-art transformers. Notably, SiMBA establishes itself as the new state-of-the-art SSM on ImageNet and transfer learning benchmarks such as Stanford Car and Flower as well as task learning benchmarks as well as seven time series benchmark datasets. The project page is available on this website ~\url{https://github.com/badripatro/Simba}.

研究动机与目标

动机：在视觉和时间序列中为长程依赖需求稳定、可扩展的序列模型。
引入稳定的通道混合机制（EinFFT）以解决 Mamba 的不稳定性。
提出 SiMBA，一种将 Mamba 序列建模与 EinFFT 通道混合相结合的精简架构。
展示 SiMBA 相对于现有 SSM 的性能提升，以及在多个数据集上与最先进 Transformer 的竞争地位。

提出的方法

将 EinFFT 介绍为一种在变换特征上使用复数爱因斯坦矩阵乘法的频域通道混合技术。
将 EinFFT 嵌入到基于 Mamba 的序列模型中，形成 SiMBA，从而实现稳定训练和高效的长序列处理。
使用带有 dropout 的残差连接和归一化来提高训练稳定性。
提供一种金字塔式/竞争性架构，结合序列建模（Mamba）与通道混合（EinFFT）。
通过在 ImageNet-1K、迁移学习基准（CIFAR、Stanford Car、Flowers）以及时间序列数据集上的广泛实验进行验证。

实验结果

研究问题

RQ1在将规模扩展到用于视觉任务的大型网络时，EinFFT 是否能稳定 Mamba？
RQ2SiMBA 是否在 ImageNet 和时间序列基准上缩小状态空间模型与 Transformer 之间的性能差距？
RQ3在迁移学习和下游任务（如实例分割）中，SiMBA 的表现如何？
RQ4架构要素（残差、dropout）对 SiMBA 的稳定性与性能有何影响？

主要发现

SiMBA 在视觉和时间序列基准上实现出色的性能，解决了 Mamba 在大规模下观察到的不稳定性问题。
EinFFT 提供了一种稳定而高效的通道混合机制，与 Mamba 结合后，在 ImageNet 及若干时间序列数据集上为 SSMs 取得最先进的结果。
SiMBA 超越了竞争的 SSM，并在所述设置中大幅缩小了与最先进 Transformer 的差距。
SiMBA 展示了在 CIFAR、Stanford Car 和 Flowers 数据集上的迁移学习有效性，并对实例分割任务具有适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。