QUICK REVIEW

[论文解读] Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

Sven Gowal, Chongli Qin|arXiv (Cornell University)|Oct 7, 2020

Adversarial Robustness in Machine Learning参考文献 91被引用 144

一句话总结

该论文对对抗性训练进行了系统性研究，以揭示其局限性，并展示将更大模型、Swish/SiLU 激活函数以及模型权重平均相结合在鲁棒性方面带来显著提升，尤其是在有未标记数据时。

ABSTRACT

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 36.88% (+8.46%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.

研究动机与目标

评估对抗性训练在对范数受限扰动鲁棒性方面的有效性与局限性。
研究训练损失、模型大小、激活函数、未标记数据和权重平均对鲁棒精度的影响。
识别显著提升 CIFAR-10/100 及 MNIST 最新鲁棒性的因素组合。

提出的方法

使用内优化/外优化损失形式化并评估对抗性训练的变体。
比较标准 AT、TRADES 和 MART 损失在不同内最大化策略下的表现。
尝试模型缩放（深度/宽度）和激活函数（Swish/SiLU）。
通过来自 80 Million Tiny Images 的伪标签将未标记数据引入，并改变标注/未标注比率。
在训练中应用模型权重平均并评估其对鲁棒性的影响。
使用强攻击（AutoAttack 和 MultiTargeted）评估鲁棒性，并基于验证鲁棒精度进行早停。

实验结果

研究问题

RQ1当前在范数受限扰动下，对抗性训练方法的局限性是什么？
RQ2内优化/外优化损失的选择如何影响鲁棒性和不同数据规约下的清洁精度？
RQ3未标记数据和伪标签是否提升鲁棒性能，应该如何整合？
RQ4模型容量（深度/宽度）和激活函数如何影响鲁棒性？
RQ5权重平均是否能在不同设置中提供稳定的鲁棒性提升？

主要发现

在包含或不包含未标记数据的 CIFAR-10 上，使用早停的 TRADES 往往在鲁棒性方面优于经典对抗性训练。
增加模型容量（深度/宽度）通常会提高鲁棒性，深层模型有时甚至优于更大参数量的模型。
Swish/SiLU 激活具有鲁棒性收益，而其他平滑激活不一定有帮助。
通过伪标签进行未标记数据可以提升鲁棒性，在他们的设置中标注数据与未标注数据的最佳比率约为 3:7。
模型权重平均在鲁棒性方面持续改善，在数据较少的设置下有时与 TRADES 的提升相当。
他们的最佳 CIFAR-10 结果在带未标记数据的情况下，对 8/255 的 L-infinity 扰动达到 65.88% 的鲁棒精度，且在无未标记数据时为 57.20%；CIFAR-10 在 L2-128/255 下达到 80.53% 的鲁棒精度等。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。