QUICK REVIEW

[Paper Review] The SpiNNaker 2 Processing Element Architecture for Hybrid Digital Neuromorphic Computing

Sebastian Höppner, Yexin Yan|arXiv (Cornell University)|Mar 15, 2021

Advanced Memory and Neural Computing41 citations

TL;DR

The paper introduces the SpiNNaker2 processing element architecture implemented in 22nm FDSOI, featuring adaptive body biasing, DVFS, a MAC accelerator, and a dual NoC, and demonstrates SNN, DNN, and hybrid SNN/DNN benchmarks.

ABSTRACT

This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the first generation SpiNNaker. To speed operation of subtasks, we have added accelerators for numerical operations of both spiking (SNN) and rate based (deep) neural networks (DNN). PEs communicate via a dedicated, custom-designed network-on-chip. We present three benchmarks showing operation of the whole processor element on SNN, DNN and hybrid SNN/DNN networks.

Motivation & Objective

Present the processing element architecture of the second-generation SpiNNaker chip (22nm FDSOI).
Showcase adaptive body biasing and DVFS to match spiking activity with power usage.
Introduce accelerators (MAC array, exp/log, RNG) to speed up neural computation.
Describe the QPE and NoC design for scalable on-chip communication.
Demonstrate benchmark results for SNN, DNN, and hybrid networks.

Proposed method

Describe the 22nm FDSOI SpiNNaker2 PE that integrates an ARM Cortex-M4F, fixed-point exponential/log accelerators, a MAC array, RNGs, and DVFS-enabled power rails.
Explain the quad-PE (QPE) organization and its GALS clocking for independent DVFS, with a two-mesh NoC (DNoC and CNoC).
Detail the 16x4 8-bit MAC accelerator for CONV/MM operations and its dataflow to maximize local SRAM reuse and NoC-fed operands.
Outline adaptive body biasing (ABB) and dual-rail SRAM within ULV operation to achieve energy-efficient performance at 0.5–0.6 V.
Present the testchip implementation in 22FDX, design choices for energy optimization, and DVFS levels PL1 and PL2.
Describe benchmark methodologies across SNN, NEF-based hybrid, and DNN layers using the MAC and ARM core.

Experimental results

Research questions

RQ1How can SpiNNaker2's PE architecture support both spiking (SNN) and rate-based (DNN) neural networks efficiently?
RQ2What energy and performance benefits can be achieved through ABB and DVFS in a 22nm FDSOI SpiNNaker2 PE?
RQ3How does the MAC accelerator enable speedups for DNN and hybrid SNN/DNN workloads compared to a processor-only implementation?
RQ4What are the scalable network-on-chip (NoC) characteristics for a 2D-QPE SpiNNaker2 layout when handling large neuromorphic workloads?

Key findings

PE energy efficiency at PL1 (0.50 V, 200 MHz) is 16.68 μW/MHz; at PL2 (0.60 V, 400 MHz) it is 20.16 μW/MHz.
MAC accelerator achieves 1.47 TOPS/W (PL1) and 1.51 TOPS/W (PL2) for 8-bit matrix multiplies; up to 1.75 TOPS/W at 0.50 V, 320 MHz.
Synfire chain SNN benchmark with DVFS reduces total power by 60.4% and leakage by 63.4% (66.4 mW baseline to 24.3 mW).
NEF-based hybrid results show the MAC accelerators enabling efficient encoding/decoding flows and energy per synaptic event around 10–20 pJ depending on model; DNN layers achieve up to 116–610x speedups over ARMNN in some layers and 148–652x energy efficiency gains for conv layers.
DVFS enables dynamic adaptation to activity, keeping most time in PL1 due to sparse activity.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.