[论文解读] Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
该论文提出 AmI,一种针对人脸识别的对抗样本检测器,利用双向属性-神经元对应关系来创建属性引导的模型,在7种攻击类型下实现约94%检测准确率,误报率约9.9%,超越特征挤压。
Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors. Recent research has demonstrated the widespread presence and the devastating consequences of such attacks. Existing defense techniques either assume prior knowledge of specific attacks or may not work well on complex models due to their underlying assumptions. We argue that adversarial sample attacks are deeply entangled with interpretability of DNN models: while classification results on benign inputs can be reasoned based on the human perceptible features/attributes, results on adversarial samples can hardly be explained. Therefore, we propose a novel adversarial sample detection technique for face recognition models, based on interpretability. It features a novel bi-directional correspondence inference between attributes and internal neurons to identify neurons critical for individual attributes. The activation values of critical neurons are enhanced to amplify the reasoning part of the computation and the values of other neurons are weakened to suppress the uninterpretable part. The classification results after such transformation are compared with those of the original model to detect adversaries. Results show that our technique can achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false positives on benign inputs. In contrast, a state-of-the-art feature squeezing technique can only achieve 55% accuracy with 23.3% false positives.
研究动机与目标
- 通过可解释性来激发对抗样本的检测,而不是依赖先前攻击知识。
- 提取与人脸属性与内部神经元强相关的属性见证。
- 构建一个属性引导的模型,通过加强与属性相关的神经元、削弱其他神经元来暴露不一致。
- 在多种攻击类型上将检测性能与最先进的特征挤压进行比较。
提出的方法
- 将属性见证定义为通过属性与神经元激活之间的双向推理,与人类可感知的面部属性强相关的神经元。
- 使用属性替换和保持来跨层识别见证集。
- 通过神经元加权变换,在层级上加强见证神经元、削弱非见证神经元,构建属性引导的模型。
- 对激活应用属性保持变换,进一步抑制不可解释的特征。
- 在测试输入上并排运行原始模型和属性引导模型;不一致表明对抗输入。
实验结果
研究问题
- RQ1面部属性与内部神经元之间的双向对应关系能否在面部识别DNN中稳健地识别跨层的属性见证?
- RQ2将模型变换以强调属性见证是否在不过度增加误报的情况下提升对抗样本的检测?
- RQ3与像 feature squeezing 这样的最先进防御相比,AmI 在不同攻击类型上的表现如何?
主要发现
- AmI 在7种攻击类型上实现约94%的检测准确率,对良性输入的误报率为9.91%。
- 在同一设置下,Feature squeezing 达到55%的准确率,误报率为23.32%,显示了 AmI 在此场景下的优越性能。
- 在排除某些属性时,属性见证提取仍然稳健,检测准确率降幅小于5%。
- 用于见证提取的双向推理在降低误报方面优于仅替换或仅保持的单向方法。
- 该方法在 VGG-Face 上与三个数据集(VF、LFW、CelebA)上进行了演示,并在 GitHub 上公开提供。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。