QUICK REVIEW

[论文解读] Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation

Cong Liao, Haoti Zhong|arXiv (Cornell University)|Aug 30, 2018

Adversarial Robustness in Machine Learning参考文献 41被引用 106

一句话总结

本文提出两种针对 CNN 的隐蔽后门注入方法，用于图像分类，实现以最小准确率下降和低污染率实现定向错误分类。

ABSTRACT

Deep learning models have consistently outperformed traditional machine learning models in various classification tasks, including image classification. As such, they have become increasingly prevalent in many real world applications including those where security is of great concern. Such popularity, however, may attract attackers to exploit the vulnerabilities of the deployed deep learning models and launch attacks against security-sensitive applications. In this paper, we focus on a specific type of data poisoning attack, which we refer to as a {\em backdoor injection attack}. The main goal of the adversary performing such attack is to generate and inject a backdoor into a deep learning model that can be triggered to recognize certain embedded patterns with a target label of the attacker's choice. Additionally, a backdoor injection attack should occur in a stealthy manner, without undermining the efficacy of the victim model. Specifically, we propose two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model. We consider two attack settings, with backdoor injection carried out either before model training or during model updating. We carry out extensive experimental evaluations under various assumptions on the adversary model, and demonstrate that such attacks can be effective and achieve a high attack success rate (above $90\%$) at a small cost of model accuracy loss (below $1\%$) with a small injection rate (around $1\%$), even under the weakest assumption wherein the adversary has no knowledge either of the original training data or the classifier model.

研究动机与目标

激发对基于 CNN 的图像分类中后门注入攻击的研究，并评估在安全敏感应用中的安全风险。
提出两种在视觉上不可察觉且有效的后门生成策略。
在不同攻击者知识与能力情境下评估攻击的可行性。
证明在保持整体模型性能的同时，以低污染率实现高攻击成功率。

提出的方法

引入两种后门生成策略：一个带有图案的静态扰动掩模和一个定向自适应扰动掩模。
将后门注入形式化为数据污染，即将注入数据集 D_A 添加到训练集中。
使用带有污染数据的小批量梯度下降训练，以同时最大化分类准确性和后门成功率。
攻击可以在两种设定下进行：在训练前的后门注入（BIB）和在更新期间的后门注入（BID）。
自适应扰动使用受 DeepFool 启发的迭代方法，在 l_infinity 约束下将样本推向目标类别的决策边界。
给出污染目标的数学表述以及后门有效性的条件。

实验结果

研究问题

RQ1视觉上隐蔽的后门扰动在触发 CNN 的定向错分类方面有多有效？
RQ2在不同攻击者知识（FK、PKD、PKM、MK）与能力下，后门攻击的边界是什么？
RQ3在维持高攻击成功率的同时，能否以对整体测试准确率的最小影响进行后门注入？
RQ4两种后门生成策略在隐蔽性和有效性方面的比较如何？

主要发现

在若干场景中，攻击成功率超过 90%，污染率约为 1%。
在测试条件下，分类准确率的损失保持在 1% 以下。
后门扰动在视觉上可能不可察觉，难以被机器检测器发现。
两种后门生成方法（带图案的静态与定向自适应）为隐蔽后门创建提供了灵活的选择。
在包括对原始数据或模型一无所知的弱对手等多种攻击者模型下，证明了攻击的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。