QUICK REVIEW

[논문 리뷰] Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Hongyi Wang, Kartik K. Sreenivasan|arXiv (Cornell University)|2020. 07. 09.

Adversarial Robustness in Machine Learning참고 문헌 90인용 수 110

한 줄 요약

논문은 edge-case (tail) backdoor 공격이 연합학습 모델에 삽입될 수 있으며 탐지하기 어렵고, data- 또는 model-poisoning을 통해 PGD-based strategies를 사용하여 여러 방어 하에서도 지속될 수 있음을 보인다.

ABSTRACT

Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time. We couple our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors. An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution. We explain how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness, and exhibit that with careful tuning at the side of the adversary, one can insert them across a range of machine learning tasks (e.g., image classification, OCR, text prediction, sentiment analysis).

연구 동기 및 목표

Motivate and formalize backdoor threats in federated learning (FL) and the difficulties of defending against them.
Introduce edge-case backdoor attacks that target tail inputs not typically present in training data.
Develop attack strategies (data poisoning, PGD-based, and model replacement) that survive standard defenses.
Theorize about the hardness of backdoor detection and its relation to adversarial robustness.
Demonstrate through experiments that edge-case attacks can be effective across diverse tasks and datasets.

제안 방법

Define p-edge-case examples as tail inputs drawn from the low-probability region of the input distribution.
Propose three attack strategies: black-box data poisoning, PGD-based training with projection to stay within defense norms, and PGD with model replacement.
Use Federated Averaging (FedAvg) with a subset of clients and varying attack patterns (fixed-frequency and fixed-pool).
Train attackers with projected gradient descent to keep updates within a norm ball to evade defenses.
Demonstrate that edge-case data can be constructed by fitting Gaussian mixtures to penultimate-layer activations to identify edge cases.
Apply edge-case backdoors across multiple tasks (image classification, OCR, sentiment, next-word prediction) and defenses (norm clipping, Krum, Multi-Krum, RFA, DP).

실험 결과

연구 질문

RQ1Can backdoors be constructed in FL models that target edge-case inputs and remain undetected by common defenses?
RQ2What theoretical relationships exist between adversarial robustness and backdoor robustness in neural networks?
RQ3Are edge-case backdoors transferable across tasks and defenses, and how do defense mechanisms impact fairness?
RQ4What are practical data-generation strategies for creating edge-case datasets that enable persistent backdoors?
RQ5How effective are edge-case backdoors under data-poisoning and model-poisoning attack paradigms?

주요 결과

Backdoors in FL are achievable and hard to detect; their existence is tied to adversarial robustness and is difficult to certify.
Edge-case backdoors can persist under several defenses, including DP, norm clipping, and robust aggregators like Krum and Multi-Krum.
Attacks become effective when 0.5-1% of edge users are adversarial, and performance can be maintained on benign data while triggering edge-case misclassifications.
Theoretical results show that if a model has adversarial examples, a backdoor exists under mild conditions, and detecting backdoors is NP-hard.
Edge-case backdoors can cause fairness-related failures by disproportionately affecting underrepresented inputs or groups.
Constructing datasets that emphasize edge-case samples can enable successful backdoor injections without obvious deviation on normal data.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.