[Paper Review] DIET-SNN: Direct Input Encoding With Leakage and Threshold Optimization in Deep Spiking Neural Networks
DIET-SNN trains a deep spiking network with direct input encoding and learnable membrane leak and firing thresholds, achieving similar accuracy to ANN with 6-18x lower compute energy and 5 timesteps of latency.
Bio-inspired spiking neural networks (SNNs), operating with asynchronous binary signals (or spikes) distributed over time, can potentially lead to greater computational efficiency on event-driven hardware. The state-of-the-art SNNs suffer from high inference latency, resulting from inefficient input encoding, and sub-optimal settings of the neuron parameters (firing threshold, and membrane leak). We propose DIET-SNN, a low-latency deep spiking network that is trained with gradient descent to optimize the membrane leak and the firing threshold along with other network parameters (weights). The membrane leak and threshold for each layer of the SNN are optimized with end-to-end backpropagation to achieve competitive accuracy at reduced latency. The analog pixel values of an image are directly applied to the input layer of DIET-SNN without the need to convert to spike-train. The first convolutional layer is trained to convert inputs into spikes where leaky-integrate-and-fire (LIF) neurons integrate the weighted inputs and generate an output spike when the membrane potential crosses the trained firing threshold. The trained membrane leak controls the flow of input information and attenuates irrelevant inputs to increase the activation sparsity in the convolutional and dense layers of the network. The reduced latency combined with high activation sparsity provides large improvements in computational efficiency. We evaluate DIET-SNN on image classification tasks from CIFAR and ImageNet datasets on VGG and ResNet architectures. We achieve top-1 accuracy of 69% with 5 timesteps (inference latency) on the ImageNet dataset with 12x less compute energy than an equivalent standard ANN. Additionally, DIET-SNN performs 20-500x faster inference compared to other state-of-the-art SNN models.
Motivation & Objective
- Motivate energy-efficient, low-latency neuromorphic inference using SNNs.
- Develop a gradient-based method to jointly optimize weights, membrane leak, and firing thresholds across layers.
- Eliminate input encoding overhead by using direct input encoding and enable the first layer to generate spikes.
- Demonstrate competitive accuracy on CIFAR and ImageNet with significantly reduced timesteps and energy.
Proposed method
- Use direct input encoding where pixel values are fed directly to the input layer.
- Employ Leaky Integrate-and-Fire (LIF) neurons with layer-shared leak and threshold parameters.
- Train the network end-to-end with backpropagation to optimize weights, leaks, and thresholds (surrogate gradient for spikes).
- Initialize from ANN-SNN conversion, with per-layer 99.7 percentile threshold selection during conversion, then fine-tune with spike-based learning.
- Derive gradients for output, hidden layers, and parameters using BPTT and surrogate gradients (Equations 1–15).
- Evaluate on VGG and ResNet architectures with CIFAR and ImageNet datasets, comparing latency and energy to prior SNNs and ANNs.
Experimental results
Research questions
- RQ1Can jointly learning membrane leak and firing thresholds per layer reduce inference latency while maintaining accuracy?
- RQ2Does direct input encoding coupled with a trainable first-layer spike generator improve activation sparsity and energy efficiency?
- RQ3How does DIET-SNN compare to state-of-the-art SNNs and ANNs in terms of accuracy, timesteps, and compute energy on CIFAR and ImageNet?
- RQ4What are the per-layer energy and spike-rate implications of leak/threshold optimization in deeper networks?
Key findings
- DIET-SNN achieves comparable top-1 accuracy to ANN baselines on CIFAR and ImageNet using only 5 timesteps.
- Joint optimization of weights, leaks, and thresholds yields substantial latency/energy benefits over prior SNNs (6–18× energy reduction vs ANN; 20–500× faster inference than other SNNs).
- Direct input encoding plus a trained spike-generator first layer removes input encoding overhead and enables high activation sparsity in deeper layers.
- Leak reduces unnecessary firing, and threshold optimization accelerates spike generation, resulting in much lower spike rates (e.g., average spike rate around 1.6 for VGG16 on CIFAR; 5 timesteps).
- The approach achieves 69% top-1 on ImageNet with 5 timesteps and significantly lower energy than an equivalent ANN.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.