QUICK REVIEW

[论文解读] The Hidden Vulnerability of Distributed Learning in Byzantium

El Mahdi El Mhamdi, Rachid Guerraoui|arXiv (Cornell University)|Feb 22, 2018

Stochastic Gradient Optimization Techniques参考文献 20被引用 473

一句话总结

论文表明，在分布式 SGD 中基于拜占庭鲁棒的聚合仍可能将训练引导到在高维下无效的模型，并且引入 Bulyan 以显著降低攻击者的可操纵空间至 O(1/√d)，并在 MNIST 和 CIFAR-10 上进行了经验验证。

ABSTRACT

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine-resilient: they ensure the convergence of SGD despite the presence of a minority of adversarial workers. We show in this paper that convergence is not enough. In high dimension $d \gg 1$, an adver\-sary can build on the loss function's non-convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine-resilient schemes leave a margin of poisoning of $Ω\left(f(d) ight)$, where $f(d)$ increases at least like $\sqrt{d~}$. Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR-10 and MNIST. We introduce Bulyan, and prove it significantly reduces the attackers leeway to a narrow $O( \frac{1}{\sqrt{d~}})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non-Byzantine gradients had been used to update the model.

研究动机与目标

在高维、非凸设定下激励对分布式 SGD 在拜占庭故障下的鲁棒性研究。
评估拜占庭鲁棒聚合规则的收敛性保证是否足以覆盖神经网络。
证明存在利用维度诅咒攻击 ℓp-based GARs 的攻击的存在性。
提出一种通用的增强方法（Bulyan），以收紧拜占庭剩余空间并证明收敛。
在 MNIST 和 CIFAR-10 上进行实证验证并分析计算权衡。

提出的方法

描述在梯度聚合规则 (GAR) 下，具有一个中心节点和 f 个拜占庭工作者的分布式 SGD 模型。
刻画一个利用高维损失景观将聚合梯度推向次优区域的简单攻击。
引入 Bulyan，这是一个两步增强，使用底层的拜占庭鲁棒规则 A 来选择一组梯度，然后通过与坐标中位数最近的 β 值来聚合坐标。
证明理论界限：(i) Bulyan 将每个坐标的拜占庭余地降至 O(1/√d)，(ii) Bulyan 在与 A 相同的 α、f 界限下保持收敛。
给出复杂度分析，表明 Bulyan 的开销为 O((n−2f)C + dn) 每个 epoch，在实际中 GeoMed/Krum 变体为 O(n^2 d)。
在 MNIST 和 CIFAR-10 上对 Bulyan 与 Brute、Krum、GeoMed 进行实证比较，并研究收敛速度与鲁棒性。

实验结果

研究问题

RQ1拜占庭鲁棒梯度聚合是否能在高维、非凸神经网络中保证收敛？
RQ2现有 GAR 在大规模非凸环境下具有多少对手方的可操控空间？
RQ3我们能否设计一种对 GAR 的增强，使攻击者的影响缩小而不牺牲收敛性？
RQ4提出的 Bulyan 方法是否恢复了鲁棒收敛性，以及在实际中对训练速度有何影响？

主要发现

在高维下，现有拜占庭鲁棒 GAR 的收敛性保证仍可能产生无效模型，当只对抗一个拜占庭工作者时。
对于 ℓp-based GARs 存在 Ω(f(d)) 的投毒边际，至少随 √d 增大，使攻击有效。
一种通用增强，Bulyan(A)，将攻击者在每个坐标上的影响严格限制为 O(σ/√d) 并保持收敛。
在 MNIST 和 CIFAR-10 上的实证结果表明，Bulyan 与 A（如 Krum）一起达到接近非拜占庭平均的准确度，并能抵抗所提出的攻击。
在非拜占庭情景下，Bulyan 的收敛速度成本适中，随着较小的批量大小而达到峰值，可以通过合理的批量大小来降低。
在 (α, f)-拜占庭鲁棒框架下，Bulyan 保留了收敛性保证（几乎确定地）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。