QUICK REVIEW

[论文解读] STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

Yansong Gao, Chang Xu|arXiv (Cornell University)|Feb 18, 2019

Adversarial Robustness in Machine Learning参考文献 38被引用 92

一句话总结

STRIP 通过对每个输入进行扰动并测量预测的熵，在运行时检测带有木马的输入；低熵指示木马输入，从而实现对模型无关的后门检测，在MNIST、CIFAR-10和GTSRB上取得了强有力的实证结果。

ABSTRACT

A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model---malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input---a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.

研究动机与目标

识别已部署的DNN模型是否包含允许输入无关木马触发的后门。
开发一个运行时、体系结构无关的检测器，支持黑盒模型访问。
展示对不同触发器大小和多种木马变体的鲁棒性。
以误报率和误拒率衡量性能并分析运行时开销。

提出的方法

通过将x与随机测试图像叠加来扰动每个输入x，生成N个扰动副本x^p_i。
将所有扰动副本和原始输入一起送入已部署的DNN，并收集预测的类别分布。
对每个扰动输入的预测计算香农熵，并聚合得到H（熵的归一化和）作为输入随机性的度量。
当H低于预定义的检测边界时，判定输入为木马输入，表明在扰动下模型输出对输入无关。
采用攻击者完全控制训练和体系结构的威胁模型，而防守方仅持有不含木马样本的验证集。
以FRR和FAR作为检测指标评估性能，并分析N和检测边界如何影响这些速率。
通过改变N并与基线推理时间进行比较来评估运行时开销。

实验结果

研究问题

RQ1在运行时、黑箱设置下，STRIP 是否能可靠地区分木马输入和良性输入？
RQ2STRIP检测器是否架构无关，并且与现有部署兼容？
RQ3STRIP 对不同触发类型、大小以及攻击者适配的鲁棒性如何？
RQ4检测性能的权衡（FRR 对 FAR）以及运行时影响是什么？

主要发现

在不同触发器和数据集上，STRIP在设定的FRR为1%时，整体FAR低于1%。
在许多测试情况下，当条件适当时，STRIP在CIFAR10和GTSRB上达到0% FAR和0% FRR。
在所有评估中，该方法对大型、输入无关的触发器仍然有效，包括在Hello Kitty风格示例中使用的触发器。
当N=10扰动时，检测时间开销大约为6.125 ms，相较于基线推理的4.63 ms，并且可以通过并行化进一步降低。
基于良性输入熵分布来选择以满足期望的FRR/FAR平衡的基于熵的检测边界。
STRIP 对多种木马变体以及一个已识别的自适应攻击（熵操作）表现出鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。