QUICK REVIEW

[논문 리뷰] How To Backdoor Federated Learning

Eugene Bagdasaryan, Andreas Veit|arXiv (Cornell University)|2018. 07. 02.

Privacy-Preserving Technologies in Data참고 문헌 73인용 수 699

한 줄 요약

본 논문은 연합학습이 모델 대체를 통한 모델 포이즈닝에 취약하며, 시맨틱 백도어를 지속시키고 데이터 포이즈닝보다 뛰어나며 보안 집계에서도 우회한다는 것을 보여준다.

ABSTRACT

Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other. For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type. We demonstrate that any participant in federated learning can introduce hidden backdoor functionality into the joint global model, e.g., to ensure that an image classifier assigns an attacker-chosen label to images with certain features, or that a word predictor completes certain sentences with an attacker-chosen word. We design and evaluate a new model-poisoning methodology based on model replacement. An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task. We evaluate the attack under different assumptions for the standard federated-learning tasks and show that it greatly outperforms data poisoning. Our generic constrain-and-scale technique also evades anomaly detection-based defenses by incorporating the evasion into the attacker's loss function during training.

연구 동기 및 목표

Motivate and formalize the model-poisoning threat in federated learning.
Show that a malicious participant can replace the global model with a backdoored one without compromising main-task accuracy.
Demonstrate semantic backdoors in image classification and word prediction tasks.
Evaluate attack efficacy under secure aggregation and anomaly-detector defenses.

제안 방법

Define attacker capabilities in federated learning, including data, training procedure, and model submission control.
Introduce model replacement as an attack to substitute the global model with a backdoored model.
Develop and evaluate the constrain-and-scale and train-and-scale techniques to evade anomaly detection.
Propose and analyze a two-task learning perspective to sustain backdoor persistence after attacker rounds.
Experiment with CIFAR-10 image classification and Reddit word prediction to demonstrate semantic backdoors.

실험 결과

연구 질문

RQ1Can a single or a few malicious participants introduce a backdoor into the federated model without reducing main-task accuracy?
RQ2How effective is model replacement compared to traditional training-data poisoning in federated learning?
RQ3Can attackers evade anomaly detectors under secure aggregation, and how can they extend backdoor persistence across rounds?
RQ4What forms of backdoors (semantic vs pixel-pattern) can be embedded in federated models, and how do they perform in practice?

주요 결과

A single-shot attack can yield 100% backdoor accuracy on the attacker-chosen task.
Attacker controls less than 1% of participants can prevent unlearning the backdoor without hurting main-task accuracy.
Model replacement greatly outperforms data poisoning in word-prediction tasks (e.g., with 80,000 participants, 8 malicious participants suffice for 50% backdoor accuracy).
Secure aggregation and anomaly detectors do not prevent the attack; attackers can evade defenses with constrain-and-scale or train-and-scale methods.
The attack works on semantic backdoors in image classification and word prediction, and is effective under non-i.i.d. data conditions.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.