QUICK REVIEW

[论文解读] DIET-SNN: Direct Input Encoding With Leakage and Threshold Optimization in Deep Spiking Neural Networks

Nitin Rathi, Kaushik Roy|arXiv (Cornell University)|Aug 9, 2020

Advanced Memory and Neural Computing参考文献 45被引用 91

一句话总结

DIET-SNN 使用直接输入编码与可学习的膜泄漏和发放阈值来训练深度尖峰网络，在性能上接近 ANN，但计算能量降低 6–18 倍，延迟为 5 个时间步。

ABSTRACT

Bio-inspired spiking neural networks (SNNs), operating with asynchronous binary signals (or spikes) distributed over time, can potentially lead to greater computational efficiency on event-driven hardware. The state-of-the-art SNNs suffer from high inference latency, resulting from inefficient input encoding, and sub-optimal settings of the neuron parameters (firing threshold, and membrane leak). We propose DIET-SNN, a low-latency deep spiking network that is trained with gradient descent to optimize the membrane leak and the firing threshold along with other network parameters (weights). The membrane leak and threshold for each layer of the SNN are optimized with end-to-end backpropagation to achieve competitive accuracy at reduced latency. The analog pixel values of an image are directly applied to the input layer of DIET-SNN without the need to convert to spike-train. The first convolutional layer is trained to convert inputs into spikes where leaky-integrate-and-fire (LIF) neurons integrate the weighted inputs and generate an output spike when the membrane potential crosses the trained firing threshold. The trained membrane leak controls the flow of input information and attenuates irrelevant inputs to increase the activation sparsity in the convolutional and dense layers of the network. The reduced latency combined with high activation sparsity provides large improvements in computational efficiency. We evaluate DIET-SNN on image classification tasks from CIFAR and ImageNet datasets on VGG and ResNet architectures. We achieve top-1 accuracy of 69% with 5 timesteps (inference latency) on the ImageNet dataset with 12x less compute energy than an equivalent standard ANN. Additionally, DIET-SNN performs 20-500x faster inference compared to other state-of-the-art SNN models.

研究动机与目标

Motivate energy-efficient, low-latency neuromorphic inference using SNNs.
Develop a gradient-based method to jointly optimize weights, membrane leak, and firing thresholds across layers.
Eliminate input encoding overhead by using direct input encoding and enable the first layer to generate spikes.
Demonstrate competitive accuracy on CIFAR and ImageNet with significantly reduced timesteps and energy.

提出的方法

Use direct input encoding where pixel values are fed directly to the input layer.
Employ Leaky Integrate-and-Fire (LIF) neurons with layer-shared leak and threshold parameters.
Train the network end-to-end with backpropagation to optimize weights, leaks, and thresholds (surrogate gradient for spikes).
Initialize from ANN-SNN conversion, with per-layer 99.7 percentile threshold selection during conversion, then fine-tune with spike-based learning.
Derive gradients for output, hidden layers, and parameters using BPTT and surrogate gradients (Equations 1–15).
Evaluate on VGG and ResNet architectures with CIFAR and ImageNet datasets, comparing latency and energy to prior SNNs and ANNs.

实验结果

研究问题

RQ1Can jointly learning membrane leak and firing thresholds per layer reduce inference latency while maintaining accuracy?
RQ2Does direct input encoding coupled with a trainable first-layer spike generator improve activation sparsity and energy efficiency?
RQ3How does DIET-SNN compare to state-of-the-art SNNs and ANNs in terms of accuracy, timesteps, and compute energy on CIFAR and ImageNet?
RQ4What are the per-layer energy and spike-rate implications of leak/threshold optimization in deeper networks?

主要发现

DIET-SNN achieves comparable top-1 accuracy to ANN baselines on CIFAR and ImageNet using only 5 timesteps.
Joint optimization of weights, leaks, and thresholds yields substantial latency/energy benefits over prior SNNs (6–18× energy reduction vs ANN; 20–500× faster inference than other SNNs).
Direct input encoding plus a trained spike-generator first layer removes input encoding overhead and enables high activation sparsity in deeper layers.
Leak reduces unnecessary firing, and threshold optimization accelerates spike generation, resulting in much lower spike rates (e.g., average spike rate around 1.6 for VGG16 on CIFAR; 5 timesteps).
The approach achieves 69% top-1 on ImageNet with 5 timesteps and significantly lower energy than an equivalent ANN.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。