QUICK REVIEW

[论文解读] Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits

Jiawang Bai, Baoyuan Wu|arXiv (Cornell University)|Feb 21, 2021

Adversarial Robustness in Machine Learning参考文献 53被引用 29

一句话总结

本文通过对已部署的DNN进行有针对性的位翻转攻击，通过仅翻转少量权重位来迫使特定样本进入目标类别，同时尽量保持整体准确率，采用 ell_p-box ADMM 优化方法。

ABSTRACT

To explore the vulnerability of deep neural networks (DNNs), many attack paradigms have been well studied, such as the poisoning-based backdoor attack in the training stage and the adversarial attack in the inference stage. In this paper, we study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes. Specifically, our goal is to misclassify a specific sample into a target class without any sample modification, while not significantly reduce the prediction accuracy of other samples to ensure the stealthiness. To this end, we formulate this problem as a binary integer programming (BIP), since the parameters are stored as binary bits ($i.e.$, 0 and 1) in the memory. By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of multipliers (ADMM) method. Consequently, the flipped critical bits can be easily determined through optimization, rather than using a heuristic strategy. Extensive experiments demonstrate the superiority of our method in attacking DNNs.

研究动机与目标

动机并研究一种通过有限的位翻转来修改部署的模型参数，从而将特定输入误分类到目标类别的新攻击。
将问题表述为二进制整数规划任务，且翻转数量具有基数约束。
开发一种高效的连续优化解，使用 ell_p-box ADMM 来识别需要翻转的关键位。
在量化DNN和跨数据集的防御下展示方法的有效性与隐蔽性。
提供对该方法鲁棒性及在部署阶段攻击中的现实考量的见解。

提出的方法

将最后一层权重建模为二进制量，并定义一个损失，使目标样本的目标类别对数提升，同时降低该样本的源类别对数。
引入针对辅助无害样本的隐蔽性目标，以约束对其他输入的附带影响。
将问题表述为 TA-LBF：在对翻转位的有限哈明/欧几里得距离约束下，最小化这两个损失之和。
通过 ell_p-box ADMM 方法，将二进制整数规划重构为一个连续问题，变量为 (u1, u2, u3)，以处理箱约束和球约束。
采用交替优化策略，使 u1、u2、u3 并行更新，b-hat 通过（不完全的）梯度步更新，对偶变量通过梯度上升更新。
给出更新规则，包括投影到箱约束和球约束，以及对 b-hat 的梯度下降步（附录给出导数的细节）。

实验结果

研究问题

RQ1是否可以通过对部署的 DNN 权重仅翻转少量位来实现对单个样本的定向错误分类？
RQ2是否通过连续的 ADMM 基求解器优化位翻转，在量化模型中优于启发式的位选择策略？
RQ3在像分段聚类等防御以及更大容量的模型下，TA-LBF 方法的表现如何？
RQ4在限制对非攻击样本的影响并需要很少的翻转方面，该攻击是否具备隐蔽性？
RQ5该方法是否可扩展到 CIFAR-10 和 ImageNet，且涵盖不同的架构和位宽？

主要发现

TA-LBF 方法在经过测试的位宽和架构上以非常少量的位翻转达到 100% 的攻击成功率。
TA-LBF 在非目标输入上的攻击后准确率仍然很高，表明具备隐蔽性。
TA-LBF 在ASR和翻转数量方面优于启发式权重攻击基线（N_flip 更低）。
在分段聚类等防御以及更大容量的网络设置下，该方法仍然有效，TA-LBF 展示出较强的 ASR 和相对较低的 N_flip。
实验覆盖 CIFAR-10 和 ImageNet，以及 ResNet 与 VGG 架构，展示 TA-LBF 的广泛适用性。
该方法可扩展到量化模型，并在保持目标错分类的同时，对防御机制表现出鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。