QUICK REVIEW

[论文解读] Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks

Yunfei Liu, Xingjun Ma|ArXiv.org|Jul 5, 2020

Adversarial Robustness in Machine Learning参考文献 69被引用 38

一句话总结

本文提出 Refool，一种隐蔽的后门攻击，使用自然反射作为触发器在 DNN 中植入后门，在极少的数据污染下实现高攻击成功率，并对防御具有强鲁棒性。

ABSTRACT

Recent studies have shown that DNNs can be compromised by backdoor attacks crafted at training time. A backdoor attack installs a backdoor into the victim model by injecting a backdoor pattern into a small proportion of the training data. At test time, the victim model behaves normally on clean test data, yet consistently predicts a specific (likely incorrect) target class whenever the backdoor pattern is present in a test example. While existing backdoor attacks are effective, they are not stealthy. The modifications made on training data or labels are often suspicious and can be easily detected by simple data filtering or human inspection. In this paper, we present a new type of backdoor attack inspired by an important natural phenomenon: reflection. Using mathematical modeling of physical reflection models, we propose reflection backdoor (Refool) to plant reflections as backdoor into a victim model. We demonstrate on 3 computer vision tasks and 5 datasets that, Refool can attack state-of-the-art DNNs with high success rate, and is resistant to state-of-the-art backdoor defenses.

研究动机与目标

研究难以被标准数据过滤检测到的隐蔽后门攻击的动机。
提出基于自然反射现象的后门触发，以提高隐蔽性和现实感。
证明在具有反射触发的情况下，较小的污染率也能在多数据集和模型上实现高攻击成功率。
展示基于反射的后门相较于此前的后门方法对现有防御更具抗性。

提出的方法

将后门触发建模为物理反射过程：x_adv = x + x_R ⊗ k，其中 k 是反射核。
定义三种反射制 (in-plane, out-of-focus blur, and ghost reflections) 及相应的核形式。
开发一个迭代的对抗性反射图像选择算法，从野外候选集合 R_cand 中挑选有效反射并形成 R_adv。
在干净标签设置下将所选反射图案注入目标类别的训练数据，并训练被污染的模型 f_adv。
推理阶段，对测试输入应用来自 R_adv 的反射以诱导目标类别 y_adv。
在跨数据集/模型的攻击有效性上进行评估，并与 Badnets、CL、和 SIG 进行比较。

实验结果

研究问题

RQ1自然反射模式是否可以作为防御难以检测的隐蔽后门触发器？
RQ2使用基于反射的触发器时，达到高攻击成功率所需的最小数据污染率是多少？
RQ3反射型后门是否能够跨数据集和模型体系结构迁移，而无需针对数据集的触发器设计？
RQ4反射型后门与最先进的防御（finetuning、pruning、Neural Cleanse）相比的表现如何？

主要发现

Refool 在 5 个数据集和多种模型上实现了攻击成功率>75%，注入率低于 3.27%。
在干净测试上的准确率平均下降不到 3%，表明具有较强的隐蔽性。
基于反射的触发器对 finetuning 和 neural pruning 防御比 CL 和 SIG 基线更具抵抗力。
对抗性反射图像选择在大约 9 次迭代内收敛为有效触发器。
不同的反射类型（类型 I-III）以及混合使用可在保持中等输入扰动的同时提升攻击强度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。