[论文解读] State Space Model for New-Generation Network Alternative to Transformers: A Survey
本文综述基于状态空间模型(SSM)的架构,作为Transformer的高效替代,概述起源、变体、在NLP、CV、图、时间序列、多模态数据的应用,并对多样化下游任务进行实验比较。
In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List.
研究动机与目标
- 介绍状态空间模型(SSMs)的原理及其作为Transformers中自注意力的替代方案的应用。
- 系统性地回顾现有的SSM变体、架构(例如Mamba、S4、S4ND、DSS)及其在多个领域的应用。
- 提供实验比较和分析,突出SSM在下游任务上的性能与效率权衡。
- 讨论推进SSM理论与应用的研究方向,并分享包含相关工作的GitHub资源。
提出的方法
- 描述离散时间SSMs的数学表述及其与卡尔曼滤波的关系。
- 解释Mamba的增强:选择性扫描算子和面向硬件的高效计算算法。
- 总结并对文献中的现有基于SSM的模型与架构(如S4、S4ND、HiPPO、DSS)在各领域的应用进行分类。
- 在下游任务中进行实验比较,以评估基于SSM的模型的有效性和效率。
- 提供关于SSM在NLP、计算机视觉、图、时间序列和多模态数据中的应用的结构化概述。
实验结果
研究问题
- RQ1在深度学习中用于序列建模的状态空间模型的核心原理和表述是什么?
- RQ2基于SSM的架构(如Mamba、S4、DSS)在不同任务上的性能和效率如何与Transformer及其他基于注意力的模型相比?
- RQ3哪些领域和数据模态最能从SSM中受益,以及这些模型的实际局限性是什么?
- RQ4哪些未来的研究方向可以推动SSM理论的发展和更广泛的应用?
主要发现
- SSMs为Transformer中的自注意力在建模长程依赖方面提供了一个可行且常常更具计算效率的替代方案。
- Mamba风格的增强提升了信息筛选和并行计算,从而提高效率和可扩展性。
- 广泛的SSM为基础的模型在NLP、CV、图、时间序列和多模态任务上显示出强劲性能,且有若干报告显示准确性和内存使用方面的有利表现。
- 该综述在下游任务上提供了广泛的实验比较,包括分类、目标跟踪、分割、图像到文本生成和再识别,显示出SSMs的实际有效性。
- 一个GitHub资源汇集了相关的SSM论文和进展,以支持持续研究。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。