[论文解读] Adversarial Robustness through Local Linearization
引入局部线性正则化器(LLR),以在训练数据附近促成线性损失行为,从而实现更快的鲁棒训练并在 CIFAR-10 和 ImageNet 上相比标准对抗训练获得更好的对抗准确性。
Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of the model and number of input dimensions increase. Further, training against less expensive and therefore weaker adversaries produces models that are robust against weak attacks but break down under attacks that are stronger. This is often attributed to the phenomenon of gradient obfuscation; such models have a highly non-linear loss surface in the vicinity of training examples, making it hard for gradient-based attacks to succeed even though adversarial examples still exist. In this work, we introduce a novel regularizer that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness. We show via extensive experiments on CIFAR-10 and ImageNet, that models trained with our regularizer avoid gradient obfuscation and can be trained significantly faster than adversarial training. Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack. Additionally, we match state of the art results for CIFAR-10 at 8/255.
研究动机与目标
- 动机与解决对鲁棒模型的对抗训练高计算成本。
- 提出一种正则化器,强制在训练数据周围的损失局部线性,以防止梯度模糊。
- 证明 Local Linearity Regularization (LLR) 能带来更快的训练和对强攻击的鲁棒性更好或相当。
- 对 CIFAR-10 和 ImageNet 在强白盒对手下的经验性评估 LLR,并与 ADV、TRADES 以及 CURE 等基线进行对比。
提出的方法
- 定义局部线性度量 gamma(epsilon, x),在一个 epsilon-球内捕捉偏离一阶泰勒展开的程度。
- 推导 Local Linearity Regularizer (LLR),对 gamma(epsilon, x) 及内部扰动项 |delta_LL R^T grad_x ell(x)| 的惩罚,约束在 epsilon-球内。
- 使用内部优化通过梯度下降找到 delta_LL R,其本质与对抗训练类似,但通常步骤要少得多。
- 给出一个联合目标 L(D) = E[ ell(x) + lambda*gamma(epsilon, x) + mu*|delta_LL R^T grad ell(x)| ] 用于训练鲁棒模型。
- 论证并经验性地表明,最小化 gamma(epsilon, x) 足以界定对抗性损失并减少梯度模糊。
实验结果
研究问题
- RQ1在训练样本周围强制损失的局部线性是否能减少梯度模糊并提高对强对手的鲁棒性?
- RQ2与标准对抗训练相比,LLR 的训练是否更快,同时达到或超过其鲁棒性?
- RQ3在 CIFAR-10 和 ImageNet 上,LLR 在强强的未定向和定向白盒攻击中的表现,与 ADV、TRADES 和 DENOISE 相比如何?
- RQ4当攻击者提高扰动强度时,LLR 对鲁棒性下降的影响是什么?
主要发现
- 在强白盒攻击下,LLR 在 CIFAR-10 的 epsilon=8/255 和 ImageNet 的 epsilon=4/255 达到了最先进的对抗准确性。
- 据报道,在 ImageNet 上,使用 LLR 训练比标准对抗训练快多达 5 倍。
- 用 LLR 训练的模型在攻击强度增加时,对抗准确性的下降更为温和,较之用对抗训练训练的模型。
- 在 ImageNet 上,LLR 在 epsilon=4/255 的未定向攻击下达到 47% 的对抗准确性,优于若干基线。
- 对于 CIFAR-10,LLR 在 epsilon=8/255 时达到 52.81% 的对抗准确性,在类似评估下与报道的基线相匹配或超过。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。