QUICK REVIEW

[论文解读] Spike-driven Transformer

Man Yao, Jiakui Hu|arXiv (Cornell University)|Jul 4, 2023

Advanced Memory and Neural Computing被引用 35

一句话总结

引入 Spike-driven Self-Attention 模块和 spike-focused 残差，将 Transformer 运算转化为稀疏加法，实现能效高、线性复杂度的自注意力，在 ImageNet 和 neuromorphic 数据集上具有有竞争力的准确性。

ABSTRACT

Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2 imes$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.

研究动机与目标

通过将 Spike Neural Networks (SNNs) 与 Transformer 架构结合，推动能源高效的深度学习。
设计一个完全 Spike-driven Transformer，其中关键运算通过稀疏加法和二值脉冲进行。
重新排列残差连接，确保整个网络中的二值脉冲通信。
在静态数据集和 neuromorphic 数据集上证明所提模型的能效和具有竞争力的准确性。

提出的方法

开发 Spike-driven Self-Attention (SDSA)，仅使用掩码和稀疏加法，避免乘法和 softmax。
用 Hadamard 掩码和按列求和替代 Q、K、V 的乘法，随后是脉冲神经元层，在标记与通道上实现线性复杂度。
重新排列残差连接以传播二值脉冲信号，避免多位脉冲输出。
通过 Spiking Patch Splitting、SDSA、MLP 和带脉冲使能的线性分类器管线处理图像输入。
提供理论能量分析，显示自注意力和整体 spike-driven 组件的显著能量节省。

实验结果

研究问题

RQ1Spike-driven Self-Attention (SDSA) 能否在不牺牲精度的前提下替代传统自注意力？
RQ2与原生 Transformer 及现有的脉冲 Transformer 相比，完全 spike-driven 的 Transformer 在能量与计算方面有哪些好处？
RQ3脉冲驱动残差连接如何影响网络动态和任务性能？
RQ4与最先进的 SNNs 相比，Spike-driven Transformer 在 ImageNet 与 neuromorphic 数据集上的性能如何？
RQ5SDSA 方法在标记和通道维度方面是否具有可扩展性？

主要发现

Spike-driven Transformer 在 288x288 输入、D=768、L=8 下的 ImageNet-1K 上取得 77.1% top-1，达到 SNN 领域的最新状态。
SDSA 通过用掩码和加法替代乘法与 softmax，将自注意力能量降低多达 87.2x，与原生自注意力相比。
能量分析表明，Spike-driven 自注意力在不同模型规模下的能量远低于 ANN 自注意力（例如 8-768 情况下存在 87.2x 的差距）。
残差连接重新设计为膜电位捷径，保持脉冲信号二值化，并且优于基于 SEW 的捷径。
该方法在静态和 neuromorphic 数据集上实现了最先进或具有竞争力的结果，包括 CIFAR-10/100、CIFAR10-DVS 和 DVS128 Gesture。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。