QUICK REVIEW

[论文解读] Defending against the Label-flipping Attack in Federated Learning

Najeeb Jebreel, Josep Domingo‐Ferrer|arXiv (Cornell University)|Jul 5, 2022

Adversarial Robustness in Machine Learning被引用 25

一句话总结

本论文介绍了一种在联邦学习中对抗标签翻转攻击的防御方法，通过提取与源输出神经元和目标输出神经元相关的梯度，对它们进行聚类以识别恶意更新，并在聚合前将它们排除。该方法对数据分布和模型规模的变化具有鲁棒性，并且优于若干现有防御。

ABSTRACT

Federated learning (FL) provides autonomy and privacy by design to participating peers, who cooperatively build a machine learning (ML) model while keeping their private data in their devices. However, that same autonomy opens the door for malicious peers to poison the model by conducting either untargeted or targeted poisoning attacks. The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples from one class (i.e., the source class) to another (i.e., the target class). Unfortunately, this attack is easy to perform and hard to detect and it negatively impacts on the performance of the global model. Existing defenses against LF are limited by assumptions on the distribution of the peers' data and/or do not perform well with high-dimensional models. In this paper, we deeply investigate the LF attack behavior and find that the contradicting objectives of attackers and honest peers on the source class examples are reflected in the parameter gradients corresponding to the neurons of the source and target classes in the output layer, making those gradients good discriminative features for the attack detection. Accordingly, we propose a novel defense that first dynamically extracts those gradients from the peers' local updates, and then clusters the extracted gradients, analyzes the resulting clusters and filters out potential bad updates before model aggregation. Extensive empirical analysis on three data sets shows the proposed defense's effectiveness against the LF attack regardless of the data distribution or model dimensionality. Also, the proposed defense outperforms several state-of-the-art defenses by offering lower test error, higher overall accuracy, higher source class accuracy, lower attack success rate, and higher stability of the source class accuracy.

研究动机与目标

激发并解决联邦学习（FL）中标签翻转攻击的安全风险。
识别区分恶意更新与诚实更新的辨别梯度模式。
提出一种动态、对分布无关的防御，在高维模型中仍然有效。
在多样化的数据集、模型规模和攻击者比例下评估该防御，并与最先进的防御方法进行比较。

提出的方法

仅从本地更新的输出层提取梯度。
动态识别梯度模量最大的两个神经元作为源类和目标类，并提取它们相连的梯度。
使用 k-means 对独立同分布和轻度非独立同分布的数据进行聚类，对于极端非IID数据使用 HDABSCANC（基于密度的）聚类。
分析簇的大小/密度（或在极端非IID下的簇邻近度）以标记潜在的恶意簇。
在更新全局模型之前，从聚合中排除识别出的恶意簇中的更新。
不需要事先了解数据分布或攻击者比例，并适应模型维数。

实验结果

研究问题

RQ1标签翻转攻击者如何影响输出层的梯度，这些信号是否可用于区分恶意与善意更新？
RQ2基于梯度的聚类方法是否能在 iid、轻度非IID 和极端非IID 数据分布下鲁棒地检测 LF 攻击？
RQ3聚焦于源/目标输出神经元是否优于使用完整更新梯度或其他部分分析的方法，且在不同模型规模和数据分布下？
RQ4在不同数据集和攻击者比例下，该方法对关键 FL 性能指标（准确率、测试误差、攻击成功率）的影响如何？

主要发现

源输出神经元和目标输出神经元的梯度揭示了区分攻击者与诚实同行的辨别模式。
聚焦相关梯度比使用整体更新在高维模型下更好地分离善恶更新。
对诚实更新和恶意更新出现两个不同的梯度簇，攻击者的梯度在各簇之间趋于更密集。
在三个数据集和不同模型规模下，该防御仍然有效，在多项指标上超过若干最先进的防御。
该方法不依赖于对数据分布或攻击者比例的假设，并适应极端非IID 设置。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。