QUICK REVIEW

[논문 리뷰] SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Ruijie Zhu, Qihang Zhao|arXiv (Cornell University)|2023. 02. 27.

Advanced Memory and Neural Computing인용 수 52

한 줄 요약

SpikeGPT는 Spike RWKV와 SRFFN을 결합하여 선형 시간 복잡도와 Neuromorphic 유사한 희소 활성화를 통해 에너지를 크게 줄이면서도 생성 및 이해 측면에서 경쟁력을 갖춘 직접 학습된 SNN 언어 모델이다.

ABSTRACT

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT', a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 45M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model to date, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity O(N^2) to linear complexity O(N) with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 20x fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

연구 동기 및 목표

Transformer에서 영감을 받은 아키텍처에서 스파이킹 뉴런을 사용해 직접 언어 모델을 학습시키는 것을 증명한다.
자기 주의의 제곱형 복잡도를 선형으로 낮춰 더 긴 시퀀스와 스트리밍 입력을 가능하게 한다.
neuromorphic 하드웨어에서 시냅스 연산을 낮추면서 경쟁력 있는 자연어 생성 및 이해를 달성한다.
두 매개변수 규모(45M 및 216M)를 보여주고 표준 Transformer 기반 기준선과 비교한다.
역순 회귀와 스파이킹 다이나믹스를 NLP 작업에 통합하는 훈련 전략 및 아키텍처 구성 요소를 탐구한다.

제안 방법

다중-헤드 자기 주의를 선형 시간 복잡도에서 순환적으로 작동하는 스파이킹 RWKV 토큰-믹서로 교체한다.
피드포워드 구성요소로 잔차 연결이 있는 스파이킹 RFFN 채널-믹서를 사용한다.
백프로파게이션을 위한 히비사이드 순방향 패스와 아크탄젠트 대리 기울기를 갖는 이진 임베딩(binary embedding)을 도입한다.
완전한 어텐션 없이 맥락을 보완하기 위한 토큰 시프트 연산을 도입한다.
Leaky Integrate-and-Fire 뉴런을 통합하여 이진 스파이크 출력을 생성하고 스트리밍 계산을 가능하게 한다.
decoder-only로 사전 학습한 다음 NLG 및 NLU 작업에 대해 작업별 최상층 적응으로 미세 조정한다.

실험 결과

연구 질문

RQ1생성적 언어 모델을 Transformer 유사 아키텍처에서 스파이킹 뉴런(SNN)으로 직접 효과적으로 학습시킬 수 있는가?
RQ2자기 어텐션을 선형적이고 순환적인 스파이킹 RWKV로 교체하면 계산 비용을 줄이면서도 경쟁력 있는 언어 모델링이 가능한가?
RQ3SpikeGPT가 희소하고 이벤트 주도 활성화를 통해 기존 트랜스포머 대비 에너지 및 계산 측면에서 어떤 이점을 제공하는가?
RQ4SpikeGPT의 45M 및 216M 매개변수 변형은 기준선에 비해 표준 NLG 및 NLU 벤치마크에서 어떻게 성능을 보이는가?
RQ5SNN 기반 NLP 모델에 효과적인 학습 전략(이진 임베딩, 대리 기울기, 토큰 시프트)은 무엇인가?

주요 결과

SpikeGPT는 지금까지 가장 큰 역전파-학습된 SNN 언어 모델로(216M 매개변수) 생성 및 이해 과제에서 경쟁력 있는 성능을 보여준다.
SpikeGPT는 Spiking RWKV를 사용하여 제곱형 주의 복잡도를 선형으로 줄이고 스트리밍, 순차적 방식으로 토큰을 처리한다.
SpikeGPT는 희소하고 이벤트 주도 활성화로 인해 일반 Transformer 대비 시냅스 연산을 20배 이상 감소시킨다.
Enwik8에서 SpikeGPT 45M, 길이 1024에서 train/test BPC 1.113/1.283, 길이 3072에서 0.903/1.262, SynOps는 Transformer보다 현저히 낮다.
OpenWebText2에서 사전 학습된 216M SpikeGPT는 WikiText-2에서 경쟁력 있는 perplexity를 보이고 WikiText-103에서는 GPT-2 변형들에 비해 상대적으로 낮은 성능으로 확장성 문제와 더 큰 규모의 학습 전략 필요성을 시사한다.
SpikeGPT는 에너지 효율을 유지하며 경쟁력 있는 NLG 및 NLU 결과를 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.