QUICK REVIEW

[Paper Review] Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Xinyun Chen, Chang Liu|arXiv (Cornell University)|Dec 15, 2017

Adversarial Robustness in Machine Learning55 references1,031 citations

TL;DR

This paper shows that backdoor poisoning attacks can implant hidden backdoors in deep learning systems under a black-box threat model, enabling high attack success with only a few poisoned samples, and even enabling physically realizable backdoors.

ABSTRACT

Deep learning models have achieved high performance on many tasks, and thus have been applied to many security-critical scenarios. For example, deep learning-based face recognition systems have been used to authenticate users to access many security-sensitive applications like payment apps. Such usages of deep learning systems provide the adversaries with sufficient incentives to perform attacks against these systems for their adversarial purposes. In this work, we consider a new type of attacks, called backdoor attacks, where the attacker's goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor. Specifically, the adversary aims at creating backdoor instances, so that the victim learning system will be misled to classify the backdoor instances as a target label specified by the adversary. In particular, we study backdoor poisoning attacks, which achieve backdoor attacks using poisoning strategies. Different from all existing work, our studied poisoning strategies can apply under a very weak threat model: (1) the adversary has no knowledge of the model and the training set used by the victim system; (2) the attacker is allowed to inject only a small amount of poisoning samples; (3) the backdoor key is hard to notice even by human beings to achieve stealthiness. We conduct evaluation to demonstrate that a backdoor adversary can inject only around 50 poisoning samples, while achieving an attack success rate of above 90%. We are also the first work to show that a data poisoning attack can create physically implementable backdoors without touching the training process. Our work demonstrates that backdoor poisoning attacks pose real threats to a learning system, and thus highlights the importance of further investigation and proposing defense strategies against them.

Motivation & Objective

Motivate the security risk of backdoor attacks in security-critical DL systems such as face recognition.
Propose backdoor poisoning strategies that require minimal poisoning samples under a weak, realistic threat model.
Introduce two broad classes of backdoor strategies—input-instance-key and pattern-key—and instantiate practical variants.
Demonstrate feasibility and stealthiness of backdoor poisonings, including physical-world applicability and robustness of the attack.
Highlight the need for defenses against covert data-poisoning backdoors in real-world deployments.

Proposed method

Define backdoor poisoning as a two-part adversary process: generate poisoning samples and create backdoor instances via a keyΣ.
Introduce two strategy classes: input-instance-key (backdoor key is a single input instance) and pattern-key (backdoor key is a pattern).
For input-instance-key, use Σ(k) to generate backdoor-like variants of a single key example and inject poisoning samples with the target label.
For pattern-key, develop three instantiations—Blended Injection, Accessory Injection, and Blended Accessory Injection—that embed a pattern into inputs to produce backdoor instances.
Formalize threat model where the attacker has no knowledge of model architecture or training data, injects a small number of poisoning samples, and aims for high backdoor success while preserving pristine performance.
Demonstrate that a few poisoning samples can induce high attack success rates in state-of-the-art face recognition systems.

Experimental results

Research questions

RQ1Can backdoor poisoning create effective backdoors under a black-box threat model with no access to training data?
RQ2What is the minimum poisoning sample count required for effective input-instance-key and pattern-key backdoors?
RQ3How do pattern-key strategies balance stealthiness (pattern noticeability) with attack effectiveness?
RQ4Are physically implementable backdoors feasible with data poisoning strategies?
RQ5How does the attack affect pristine model performance while enabling backdoor success?

Key findings

An attacker can inject around 5 poisoning samples to create backdoor instances when using an input-instance-key strategy on large training sets (~600,000 samples).
Pattern-key backdoors require around 50 poisoning samples to achieve attack success rates above 90%.
Backdoor instances can be made hard to notice (stealthy patterns) yet still yield high attack success.
The proposed pattern-key strategies enable physically implementable backdoors (e.g., with accessories like glasses or specific patterns).
The attacks operate in a black-box setting and can preserve high pristine test accuracy, making detection difficult.
The study demonstrates two broad classes and three concrete instantiations of pattern-key attacks, showing practical feasibility.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.