QUICK REVIEW

[论文解读] SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Ruijie Zhu, Qihang Zhao|arXiv (Cornell University)|Feb 27, 2023

Advanced Memory and Neural Computing被引用 52

一句话总结

SpikeGPT 是首个直接在脉冲神经网络 (SNN) 语言模型上训练的语言模型，结合脉冲 RWKV 与 SRFFN 实现生成和理解的竞争力，具线性时间复杂度且通过类神经形态的稀疏激活显著降低能耗。

ABSTRACT

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT', a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 45M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity O(N^2) to linear complexity O(N) with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 20x fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

研究动机与目标

在受 Transformer 启发的架构中，展示使用脉冲神经元直接训练语言模型。
将自注意力的二次复杂度降低为线性复杂度，以支持更长的序列和流式输入。
在降低神经突触操作的同时实现具有竞争力的自然语言生成与理解，适用于类神经形态硬件。
展示两个参数规模（45M 和 216M），并与标准基于 Transformer 的基线进行比较。
探索将递归与脉冲动力学整合到自然语言处理任务中的训练策略和架构组件。

提出的方法

用以递归运作、实现线性时间复杂度的脉冲 RWKV 标记混合器取代多头自注意力。
使用带残差连接的脉冲 RFFN 通道混合器作为前馈组件。
引入带有 Heaviside 前向传播和反正切近似梯度的二进制嵌入以进行反向传播。
加入标记位移操作以在不使用完整注意力的情况下补充上下文。
整合泄漏积分发放（Leaky Integrate-and-Fire）神经元以产生二进制尖峰输出并实现流式计算。
先以解码器为主进行预训练，然后针对自然语言生成与理解任务进行微调，并进行任务特定的顶层适应。

实验结果

研究问题

RQ1Can a generative language model be effectively trained directly with spiking neurons (SNNs) in a Transformer-like architecture?
RQ2Does replacing self-attention with a linear, recurrent Spiking RWKV enable competitive language modeling while reducing computational cost?
RQ3What are the energy and computational benefits of SpikeGPT on neuromorphic-like sparse activations compared to traditional Transformers?
RQ4How do SpikeGPT variants (45M and 216M parameters) perform on standard NLG and NLU benchmarks relative to baselines?
RQ5What training strategies (binary embeddings, surrogate gradients, token shift) are effective for SNN-based NLP models?

主要发现

SpikeGPT is the largest backpropagation-trained SNN language model to date (216M parameters) and demonstrates competitive performance on generation and comprehension tasks.
SpikeGPT reduces quadratic attention complexity to linear by using Spiking RWKV and processes tokens in a streaming, sequential manner.
SpikeGPT achieves over 20× fewer synaptic operations (SynOps) than a vanilla Transformer due to sparse, event-driven activations.
On Enwik8, SpikeGPT 45M with 1024 sequence length attains train/test BPC of 1.113/1.283 and SpikeGPT 45M with 3072 length attains 0.903/1.262, with SynOps markedly lower than Transformers.
With pre-training on OpenWebText2, the 216M SpikeGPT shows competitive perplexity on WikiText-2 and relatively lower performance on WikiText-103 compared to GPT-2 variants, indicating scalability challenges and the need for larger-scale training strategies.
SpikeGPT delivers competitive NLG and NLU results while maintaining energy efficiency suitable for neuromorphic hardware.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。