QUICK REVIEW

[论文解读] Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Xinyun Chen, Chang Liu|arXiv (Cornell University)|Dec 15, 2017

Adversarial Robustness in Machine Learning参考文献 55被引用 1,031

一句话总结

这篇论文表明，在黑盒威胁模型下，后门污染攻击可以在深度学习系统中植入隐藏的后门，利用仅少量被污染样本实现高攻击成功率，甚至实现物理上可实现的后门。

ABSTRACT

Deep learning models have achieved high performance on many tasks, and thus have been applied to many security-critical scenarios. For example, deep learning-based face recognition systems have been used to authenticate users to access many security-sensitive applications like payment apps. Such usages of deep learning systems provide the adversaries with sufficient incentives to perform attacks against these systems for their adversarial purposes. In this work, we consider a new type of attacks, called backdoor attacks, where the attacker's goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor. Specifically, the adversary aims at creating backdoor instances, so that the victim learning system will be misled to classify the backdoor instances as a target label specified by the adversary. In particular, we study backdoor poisoning attacks, which achieve backdoor attacks using poisoning strategies. Different from all existing work, our studied poisoning strategies can apply under a very weak threat model: (1) the adversary has no knowledge of the model and the training set used by the victim system; (2) the attacker is allowed to inject only a small amount of poisoning samples; (3) the backdoor key is hard to notice even by human beings to achieve stealthiness. We conduct evaluation to demonstrate that a backdoor adversary can inject only around 50 poisoning samples, while achieving an attack success rate of above 90%. We are also the first work to show that a data poisoning attack can create physically implementable backdoors without touching the training process. Our work demonstrates that backdoor poisoning attacks pose real threats to a learning system, and thus highlights the importance of further investigation and proposing defense strategies against them.

研究动机与目标

Motivate the security risk of backdoor attacks in security-critical DL systems such as face recognition.
Propose backdoor poisoning strategies that require minimal poisoning samples under a weak, realistic threat model.
Introduce two broad classes of backdoor strategies—input-instance-key and pattern-key—and instantiate practical variants.
Demonstrate feasibility and stealthiness of backdoor poisonings, including physical-world applicability and robustness of the attack.
Highlight the need for defenses against covert data-poisoning backdoors in real-world deployments.

提出的方法

Define backdoor poisoning as a two-part adversary process: generate poisoning samples and create backdoor instances via a keyΣ.
Introduce two strategy classes: input-instance-key (backdoor key is a single input instance) and pattern-key (backdoor key is a pattern).
For input-instance-key, use Σ(k) to generate backdoor-like variants of a single key example and inject poisoning samples with the target label.
For pattern-key, develop three instantiations—Blended Injection, Accessory Injection, and Blended Accessory Injection—that embed a pattern into inputs to produce backdoor instances.
Formalize threat model where the attacker has no knowledge of model architecture or training data, injects a small number of poisoning samples, and aims for high backdoor success while preserving pristine performance.
Demonstrate that a few poisoning samples can induce high attack success rates in state-of-the-art face recognition systems.

实验结果

研究问题

RQ1Can backdoor poisoning create effective backdoors under a black-box threat model with no access to training data?
RQ2What is the minimum poisoning sample count required for effective input-instance-key and pattern-key backdoors?
RQ3How do pattern-key strategies balance stealthiness (pattern noticeability) with attack effectiveness?
RQ4Are physically implementable backdoors feasible with data poisoning strategies?
RQ5How does the attack affect pristine model performance while enabling backdoor success?

主要发现

An attacker can inject around 5 poisoning samples to create backdoor instances when using an input-instance-key strategy on large training sets (~600,000 samples).
Pattern-key backdoors require around 50 poisoning samples to achieve attack success rates above 90%.
Backdoor instances can be made hard to notice (stealthy patterns) yet still yield high attack success.
The proposed pattern-key strategies enable physically implementable backdoors (e.g., with accessories like glasses or specific patterns).
The attacks operate in a black-box setting and can preserve high pristine test accuracy, making detection difficult.
The study demonstrates two broad classes and three concrete instantiations of pattern-key attacks, showing practical feasibility.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。