[논문 리뷰] Spike-driven Transformer
Spike-driven Self-Attention 모듈과 spike-focused residuals를 도입하여 Transformer 연산을 희소 덧셈으로 변환하고, ImageNet 및 neuromorphic 데이터셋에서 에너지 효율적이며 선형 복잡도 self-attention을 달성하고 경쟁력 있는 정확도를 제시합니다.
Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2 imes$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.
연구 동기 및 목표
- Spike Neural Networks (SNNs)와 Transformer 아키텍처를 결합하여 에너지 효율적인 딥러닝을 추진한다.
- key operations가 sparse additions와 binary spikes를 통해 수행되는 완전한 spike-driven Transformer를 설계한다.
- 네트워크 전반에 걸쳐 binary spike 커뮤니케이션을 보장하기 위해 residual 연결을 재배치한다.
- 정적 및 뉴로모픽 데이터셋에서 제안된 모델의 에너지 효율성과 경쟁력 있는 정확도를 입증한다.
제안 방법
- mask와 sparse addition만을 사용하고 multiplications와 softmax를 피하는 Spike-driven Self-Attention (SDSA)를 개발한다.
- Q, K, V 곱셈을 Hadamard masks와 column-wise summations으로 대체하고 그 뒤에 spike neuron layer를 배치하여 token 및 channel에서 선형 복잡도를 달성한다.
- binary spike 신호를 전파하고 다중 비트 spike 출력을 피하도록 residual 연결을 재배치한다.
- Spiking Patch Splitting, SDSA, MLP, 그리고 spike-enabled 파이프라인이 있는 linear classifier를 통해 이미지 입력을 처리한다.
- self-attention 및 전체 spike-driven 구성요소에 대한 큰 에너지 절감이 가능함을 보이는 이론적 에너지 분석을 제공한다.
실험 결과
연구 질문
- RQ1Can Spike-driven Self-Attention (SDSA) replace traditional self-attention without sacrificing accuracy?
- RQ2What are the energy and computation benefits of a fully spike-driven Transformer compared to vanilla Transformer and existing spiking Transformers?
- RQ3How do spike-driven residual connections affect network dynamics and task performance?
- RQ4What is the performance of the Spike-driven Transformer on ImageNet and neuromorphic datasets compared to state-of-the-art SNNs?
- RQ5Is the SDSA approach scalable in terms of token and channel dimensions?
주요 결과
- Spike-driven Transformer achieves 77.1% top-1 on ImageNet-1K under 288x288 input, D=768, L=8, reporting state-of-the-art in the SNN field.
- SDSA reduces self-attention energy by up to 87.2x compared to vanilla self-attention by replacing multiplications and softmax with mask and addition operations.
- Energy analysis shows Spike-driven self-attention energy is dramatically lower than ANN self-attention across model sizes (e.g., 8-768 case with 87.2x gap).
- Residual connections redesigned as membrane potential shortcuts keep spike signals binary and improve performance versus SEW-based shortcuts.
- The approach yields state-of-the-art or competitive results on static and neuromorphic datasets, including CIFAR-10/100, CIFAR10-DVS, and DVS128 Gesture.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.