QUICK REVIEW

[论文解读] Training spiking multi-layer networks with surrogate gradients on an analog neuromorphic substrate

Benjamin Cramer, Sebastian Billaudelle|arXiv (Cornell University)|Jun 12, 2020

Advanced Memory and Neural Computing参考文献 75被引用 21

一句话总结

本文提出了一种在模拟BrainScales-2神经形态芯片上使用代理梯度对脉冲多层网络进行软硬件协同训练的方法。通过在硬件上执行前向传播，软件中进行反向传播，该方法在低功耗（<300 mW）和高吞吐量（70 k patterns/sec）下实现了高精度（MNIST上达到97.5%），展示了在模拟基底上高效、稀疏且能效感知的脉冲网络处理能力。

ABSTRACT

Spiking neural networks are nature's solution for parallel information processing with high temporal precision at a low metabolic energy cost. To that end, biological neurons integrate inputs as an analog sum and communicate their outputs digitally as spikes, i.e., sparse binary events in time. These architectural principles can be mirrored effectively in analog neuromorphic hardware. Nevertheless, training spiking neural networks with sparse activity on hardware devices remains a major challenge. Primarily this is due to the lack of suitable training methods that take into account device-specific imperfections and operate at the level of individual spikes instead of firing rates. To tackle this issue, we developed a hardware-in-the-loop strategy to train multi-layer spiking networks using surrogate gradients on the analog BrainScales-2 chip. Specifically, we used the hardware to compute the forward pass of the network, while the backward pass was computed in software. We evaluated our approach on downscaled 16x16 versions of the MNIST and the fashion MNIST datasets in which spike latencies encoded pixel intensities. The analog neuromorphic substrate closely matched the performance of equivalently sized networks implemented in software. It is capable of processing 70 k patterns per second with a power consumption of less than 300 mW. Added activity regularization resulted in sparse network activity with about 20 spikes per input, at little to no reduction in classification performance. Thus, overall, our work demonstrates low-energy spiking network processing on an analog neuromorphic substrate and sets several new benchmarks for hardware systems in terms of classification accuracy, processing speed, and efficiency. Importantly, our work emphasizes the value of hardware-in-the-loop training and paves the way toward energy-efficient information processing on non-von-Neumann architectures.

研究动机与目标

解决在具有稀疏脉冲活动的模拟神经形态硬件上训练脉冲神经网络的挑战。
克服现有训练方法依赖发放率且忽略器件特异性缺陷的局限性。
通过整合硬件与软件计算，实现在模拟神经形态基底上端到端训练深度脉冲网络。
在非冯·诺依曼、事件驱动的计算框架中实现高分类准确率、低功耗和高速处理。

提出的方法

采用软硬件协同训练策略，其中前向传播在模拟BrainScales-2神经形态芯片上执行，反向传播则在软件中使用代理梯度计算。
将MNIST和fashion MNIST数据集中的像素强度编码为脉冲延迟，以实现在脉冲网络中的时间编码。
使用代理梯度将误差信号反向传播通过不可微分的脉冲生成神经元，从而在脉冲离散性的约束下实现端到端训练。
应用活动正则化以强制网络保持稀疏活动，将平均脉冲数减少至每个输入约20个，且不降低性能。
使用带有代理梯度的反向传播时间方法训练多层脉冲网络，并根据模拟硬件约束进行调整。
在下采样至16x16的MNIST和fashion MNIST版本上评估性能，以降低计算负载，同时保持分类保真度。

实验结果

研究问题

RQ1代理梯度反向传播能否有效应用于具有器件特异性缺陷的模拟神经形态基底上训练深度脉冲神经网络？
RQ2与相同数据集上软件实现的等效方法相比，软硬件协同训练在多大程度上保持了分类准确率？
RQ3在速度和功耗方面，模拟神经形态硬件在处理脉冲模式时的效率如何？
RQ4活动正则化能否在不损害MNIST和fashion MNIST数据集分类性能的前提下，实现稀疏脉冲活动（例如，每个输入约20个脉冲）？
RQ5在模拟神经形态硬件上直接训练脉冲网络时，能效、处理速度与准确率之间的权衡关系如何？

主要发现

软硬件协同训练方法在16x16 MNIST数据集上实现了97.5%的分类准确率，与软件实现的网络性能相当。
系统以低于300 mW的功耗每秒处理70,000个样本，展示了极高的处理效率。
活动正则化成功将平均网络活动降低至每个输入约20个脉冲，同时保持了高分类准确率。
模拟神经形态基底的性能与软件等效实现非常接近，验证了其在训练脉冲网络方面的适用性。
该方法在基于硬件的脉冲神经网络系统中设定了分类准确率、处理速度和能效的新基准。
结果证明了使用代理梯度在模拟神经形态硬件上直接训练深度脉冲网络的可行性，为实现能效高、非冯·诺依曼的计算铺平了道路。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。