Skip to main content
QUICK REVIEW

[論文レビュー] Defending against the Label-flipping Attack in Federated Learning

Najeeb Jebreel, Josep Domingo‐Ferrer|arXiv (Cornell University)|Jul 5, 2022
Adversarial Robustness in Machine Learning被引用数 25
ひとこと要約

この論文は、ソース出力ニューロンとターゲット出力ニューロンに関係する勾配を抽出し、それらをクラスタリングして悪い更新を識別し、集約から除外することで、federated learning におけるラベル反転攻撃に対する防御を導入します。アプローチはデータ分布とモデルサイズの変動に対して頑健であり、いくつかの既存防御を上回ります。

ABSTRACT

Federated learning (FL) provides autonomy and privacy by design to participating peers, who cooperatively build a machine learning (ML) model while keeping their private data in their devices. However, that same autonomy opens the door for malicious peers to poison the model by conducting either untargeted or targeted poisoning attacks. The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples from one class (i.e., the source class) to another (i.e., the target class). Unfortunately, this attack is easy to perform and hard to detect and it negatively impacts on the performance of the global model. Existing defenses against LF are limited by assumptions on the distribution of the peers' data and/or do not perform well with high-dimensional models. In this paper, we deeply investigate the LF attack behavior and find that the contradicting objectives of attackers and honest peers on the source class examples are reflected in the parameter gradients corresponding to the neurons of the source and target classes in the output layer, making those gradients good discriminative features for the attack detection. Accordingly, we propose a novel defense that first dynamically extracts those gradients from the peers' local updates, and then clusters the extracted gradients, analyzes the resulting clusters and filters out potential bad updates before model aggregation. Extensive empirical analysis on three data sets shows the proposed defense's effectiveness against the LF attack regardless of the data distribution or model dimensionality. Also, the proposed defense outperforms several state-of-the-art defenses by offering lower test error, higher overall accuracy, higher source class accuracy, lower attack success rate, and higher stability of the source class accuracy.

研究の動機と目的

  • Motivate and address the security risks of label-flipping attacks in federated learning (FL).
  • Identify discriminative gradient patterns that separate malicious from honest updates.
  • Propose a dynamic, distribution-agnostic defense that remains effective for high-dimensional models.
  • Evaluate the defense across diverse datasets, model sizes, and attacker ratios, comparing to state-of-the-art defenses.

提案手法

  • Extract gradients only from the output layer of local updates.
  • Dynamically identify the two neurons with the highest gradient magnitudes as source and target classes and extract their connected gradients.
  • Cluster the extracted gradients using k-means for iid and mild non-iid data, and HDABSCANC (density-based) for extreme non-iid data.
  • Analyze cluster size/density (or cluster proximity in extreme non-iid) to flag a potentially malicious cluster.
  • Exclude updates in the identified bad cluster from aggregation before updating the global model.
  • Do not require prior knowledge of data distributions or attacker ratios, and adapt to model dimensionality.

実験結果

リサーチクエスチョン

  • RQ1How do label-flipping attackers influence gradients in the output layer, and can these signals be used to distinguish bad from good updates?
  • RQ2Can a gradient-based clustering approach robustly detect LF attacks across iid, mild non-iid, and extreme non-iid data distributions?
  • RQ3Does focusing on source/target output neurons outperform methods that use full update gradients or other partial analyses under varying model sizes and data distributions?
  • RQ4What is the impact of the proposed method on key FL performance metrics (accuracy, test error, attack success rate) across different datasets and attacker ratios?

主な発見

  • The gradients of the source and target output neurons reveal discriminative patterns that separate attackers from honest peers.
  • Focusing on relevant gradients yields better separation of good and bad updates than using whole updates, especially for high-dimensional models.
  • Two distinct gradient clusters emerge for honest and malicious updates, with attackers’ gradients tending to be denser across clusters.
  • The proposed defense remains effective across three datasets and various model sizes, outperforming several state-of-the-art defenses in multiple metrics.
  • The method does not rely on assumptions about data distribution or attacker proportion, and adapts to extreme non-iid settings.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。