QUICK REVIEW

[论文解读] Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks

Sanghyun Hong, Pietro Frigo|arXiv (Cornell University)|Jun 3, 2019

Adversarial Robustness in Machine Learning参考文献 59被引用 103

一句话总结

本文分析了由硬件故障攻击（如 Rowhammer）引发的 DNN 参数单比特翻转如何在19个模型和多份数据集上造成严重且非平滑的准确率下降，并提出了缓解方法。

ABSTRACT

Deep neural networks (DNNs) have been shown to tolerate "brain damage": cumulative changes to the network's parameters (e.g., pruning, numerical perturbations) typically result in a graceful degradation of classification accuracy. However, the limits of this natural resilience are not well understood in the presence of small adversarial changes to the DNN parameters' underlying memory representation, such as bit-flips that may be induced by hardware fault attacks. We study the effects of bitwise corruptions on 19 DNN models---six architectures on three image classification tasks---and we show that most models have at least one parameter that, after a specific bit-flip in their bitwise representation, causes an accuracy loss of over 90%. We employ simple heuristics to efficiently identify the parameters likely to be vulnerable. We estimate that 40-50% of the parameters in a model might lead to an accuracy drop greater than 10% when individually subjected to such single-bit perturbations. To demonstrate how an adversary could take advantage of this vulnerability, we study the impact of an exemplary hardware fault attack, Rowhammer, on DNNs. Specifically, we show that a Rowhammer enabled attacker co-located in the same physical machine can inflict significant accuracy drops (up to 99%) even with single bit-flip corruptions and no knowledge of the model. Our results expose the limits of DNNs' resilience against parameter perturbations induced by real-world fault attacks. We conclude by discussing possible mitigations and future research directions towards fault attack-resilient DNNs.

研究动机与目标

评估在硬件故障攻击下 DNN 参数对单比特翻转的易受攻击性。
表征位位置、翻转方向、参数符号和体系结构如何影响易受攻击性。
在现实的 MLaaS 设置中评估实际的攻击场景（Rowhammer）。
识别可能的缓解措施以提高 DNN 的容错鲁棒性。

提出的方法

系统地翻转模型每个参数的每一位，并在验证集上测量误分类率。
分析 19 种 DNN 模型在 MNIST、CIFAR10 和 ImageNet 上以确定易受攻击的参数（RAD>0.1）。
使用加速启发式方法（采样验证、特定位、采样参数）来处理大模型。
在同址的 MLaaS 场景中模拟 Rowhammer 攻击以评估实际影响。
评估缓解措施，如激活幅度限制（ReLU6）以及权重的量化/二值化。

实验结果

研究问题

RQ1在不同架构和数据集下，DNN 参数对单比特翻转的易受攻击程度如何？
RQ2哪些位位置、翻转方向和参数符号对造成非区分性损害（RAD>0.1）贡献最大？
RQ3在同址的 MLaaS 设置下，像 Rowhammer 这样的实际硬件故障攻击是否会导致较大准确率下降？
RQ4常见的训练技术（dropout、批量归一化）是否能缓解单比特易受攻击性？
RQ5哪些缓解措施在不牺牲太多准确率的前提下有效降低易受攻击性？

主要发现

平均而言，大约 50% 的参数对单比特翻转造成 RAD>0.1，在各模型中都存在。
某些参数在特定比特翻转时可导致>90% 的准确率损失，意味着无优雅的降解。
易受攻击性主要来自于较大数值尖峰，其中指数位（特别是第31位）影响最大。
在 ReLU 下，正参数更易受攻击，但允许负输出的其他激活也会增加负参数带来的风险。
扩展层宽度会扩大易受攻击参数数量，而 dropout/批量归一化提供的保护有限。
在没有模型知识的实际场景中，启用 Rowhammer 的攻击者可造成高达 99% 的准确率下降。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。