[论文解读] Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation
本论文提出通过知识蒸馏与噪声注入训练深度神经网络,以增强对模拟硬件噪声的鲁棒性,在ImageNet和多种架构上实现近两倍于先前方法的噪声容忍度,从而实现能效更高的模拟加速器的实际部署。
The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for accelerating neural networks, based on either electronic, optical or photonic devices, which may well achieve lower power consumption than conventional digital electronics. However, these proposed analog accelerators suffer from the intrinsic noise generated by their physical components, which makes it challenging to achieve high accuracy on deep neural networks. Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning. In this paper, we advance the understanding of noisy neural networks. We outline how a noisy neural network has reduced learning capacity as a result of loss of mutual information between its input and output. To combat this, we propose using knowledge distillation combined with noise injection during training to achieve more noise robust networks, which is demonstrated experimentally across different networks and datasets, including ImageNet. Our method achieves models with as much as two times greater noise tolerance compared with the previous best attempts, which is a significant step towards making analog hardware practical for deep learning.
研究动机与目标
- 解决模拟神经网络加速器中固有硬件噪声导致的推理准确率下降问题。
- 理解噪声如何通过降低输入与输出之间的互信息来降低模型容量。
- 开发一种无需改变模型架构或推理硬件即可提升鲁棒性的训练方法。
- 在多种模型和数据集(包括ImageNet)上实现最先进的噪声容忍度。
- 通过软件层面的鲁棒性设计,放宽硬件精度要求,实现模拟加速器的实际部署。
提出的方法
- 使用预训练教师模型的知识蒸馏方法训练学生神经网络,以传递鲁棒性。
- 在训练的前向传播过程中向网络权重注入高斯噪声,以模拟模拟硬件缺陷。
- 在软标签交叉熵损失中使用温度缩放参数 $ T $,以稳定训练并降低对权重扰动的敏感性。
- 采用温度 $ T=6 $ 和噪声注入水平 $ \eta $ 进行蒸馏训练,使模型对连续随机权重噪声具有鲁棒性。
- 使用带噪声注入权重的标准反向传播优化学生模型,同时保持教师模型的输出分布。
- 在多个推理运行中评估不同噪声水平 $ \eta \in \{0, 0.02, 0.04, 0.06\} $ 下的鲁棒性。
实验结果
研究问题
- RQ1模拟硬件中的噪声如何影响深度神经网络的学习能力和推理准确率?
- RQ2知识蒸馏在多大程度上可提升神经网络对连续权重噪声的鲁棒性?
- RQ3训练期间的噪声注入能否有效模拟并为真实模拟硬件噪声做好准备?
- RQ4与基线训练相比,知识蒸馏与噪声注入的结合在噪声容忍度方面表现如何?
- RQ5在ImageNet等标准基准上,该方法可实现的最大噪声容忍度是多少?
主要发现
- 所提方法相比以往最佳方法,噪声容忍度最高提升约2倍,显著增强鲁棒性。
- 在ResNet-50与ImageNet上,知识蒸馏与噪声注入方法在 $ \eta = 0.06 $ 时保持67.525%的Top-1准确率,而未正则化训练仅为46.284%。
- 在 $ \eta = 0.04 $ 时,该方法达到71.442%的Top-1准确率,优于基线的64.382%。
- 随着噪声水平升高,知识蒸馏与噪声注入带来的准确率增益也越大,表明在高噪声压力下正则化效果更强。
- 该方法在多次独立训练与推理运行中均一致提升鲁棒性,标准差较低(例如 $ \eta = 0.06 $ 时为±0.162%)。
- 该方法无需修改架构即可实现更高噪声容忍度,表明可放宽模拟加速器的硬件规格要求。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。