Skip to main content
QUICK REVIEW

[論文レビュー] Dynamic Backdoor Attacks Against Machine Learning Models

Ahmed Salem, Rui Wen|arXiv (Cornell University)|Mar 7, 2020
Adversarial Robustness in Machine Learning被引用数 46
ひとこと要約

本論文は、Dynamic Backdoor Attacks for DNNs を紹介する。Random Backdoor、Backdoor Generating Network (BaN)、および conditional BaN (c-BaN) を含み、MNIST、CelebA、CIFAR-10 においてほぼ完璧なバックドア性能と最先端防御の回避を示す。

ABSTRACT

Machine learning (ML) has made tremendous progress during the past decade and is being adopted in various critical real-world applications. However, recent research has shown that ML models are vulnerable to multiple security and privacy attacks. In particular, backdoor attacks against ML models have recently raised a lot of awareness. A successful backdoor attack can cause severe consequences, such as allowing an adversary to bypass critical authentication systems. Current backdooring techniques rely on adding static triggers (with fixed patterns and locations) on ML model inputs which are prone to detection by the current backdoor detection mechanisms. In this paper, we propose the first class of dynamic backdooring techniques against deep neural networks (DNN), namely Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). Triggers generated by our techniques can have random patterns and locations, which reduce the efficacy of the current backdoor detection mechanisms. In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers. Moreover, c-BaN is the first conditional backdooring technique that given a target label, it can generate a target-specific trigger. Both BaN and c-BaN are essentially a general framework which renders the adversary the flexibility for further customizing backdoor attacks. We extensively evaluate our techniques on three benchmark datasets: MNIST, CelebA, and CIFAR-10. Our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss. We further show that our techniques can bypass current state-of-the-art defense mechanisms against backdoor attacks, including ABS, Februus, MNTD, Neural Cleanse, and STRIP.

研究の動機と目的

  • Motivate and formalize the threat of dynamic backdoors in ML models.
  • Propose three dynamic backdoor techniques (Random Backdoor, BaN, c-BaN).
  • Demonstrate trigger randomness and conditioning increase evasion of defenses.
  • Evaluate the attacks on standard image datasets and analyze defender bypass capabilities.

提案手法

  • Define dynamic backdoor formalism with triggers of varying pattern and location.
  • Introduce Random Backdoor that samples triggers uniformly and places them randomly.
  • Develop BaN, a generative network that learns triggers jointly with the backdoored model.
  • Extend BaN to c-BaN by conditioning trigger generation on the target label.
  • Train and evaluate on MNIST, CelebA, and CIFAR-10 with both single and multiple target labels.
  • Assess robustness against existing backdoor defenses by testing against ABS, Februus, MNTD, Neural Cleanse, and STRIP.

実験結果

リサーチクエスチョン

  • RQ1Can dynamic backdoors achieve high attack success with negligible utility loss across multiple datasets?
  • RQ2Do dynamic backdoors bypass current state-of-the-art backdoor defenses?
  • RQ3How do Random Backdoor, BaN, and c-BaN compare in single-target versus multi-target scenarios?
  • RQ4Can triggers be generated algorithmically to adapt to defenses and target labels?
  • RQ5What are the impacts on model utility when injecting dynamic backdoors?

主な発見

  • Backdoor success rate is essentially 100% on backdoored test data across datasets and targets.
  • Backdoored models maintain similar or negligiblely reduced accuracy compared to clean models (e.g., MNIST 99% vs 99%, CelebA ~70% with ~0–2% drop on CIFAR-10).
  • BaN and c-BaN provide algorithmic trigger generation with high flexibility and target-specific triggering in multi-label settings.
  • Dynamic backdoors can bypass contemporary defenses such as Neural Cleanse, ABS, and STRIP.
  • The proposed methods exhibit dynamic triggers with varying patterns and locations, making detection harder than static backdoors.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。