QUICK REVIEW

[论文解读] Adversarial Machine Learning at Scale

Alexey Kurakin, Ian Goodfellow|arXiv (Cornell University)|Nov 4, 2016

Adversarial Robustness in Machine Learning被引用 375

一句话总结

本论文展示了基于 Inception v3 的可扩展对抗训练在 ImageNet 上的应用，展示了对单步对抗攻击的鲁棒性提升，讨论了可迁移性、模型容量效应，以及一个标签泄露现象。

ABSTRACT

Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean inputs. So far, adversarial training has primarily been applied to small problems. In this research, we apply adversarial training to ImageNet. Our contributions include: (1) recommendations for how to succesfully scale adversarial training to large models and datasets, (2) the observation that adversarial training confers robustness to single-step attack methods, (3) the finding that multi-step attack methods are somewhat less transferable than single-step attack methods, so single-step attacks are the best for mounting black-box attacks, and (4) resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples, because the adversarial example construction process uses the true label and the model can learn to exploit regularities in the construction process.

研究动机与目标

使用批量归一化和混合对抗/干净小批量，展示针对大规模模型和数据集（ImageNet）的可扩展对抗训练。
评估训练后模型对不同对抗攻击方法的鲁棒性，尤其是一阶步攻击与多步骤攻击之比较。
研究模型容量和训练选择如何影响对抗扰动的鲁棒性。
识别跨模型对抗样例的可迁移性及对黑箱攻击的影响。
揭示并分析对抗训练场景中的标签泄露效应。

提出的方法

回顾并比较多种对抗样本生成方法（单步与迭代）。
提出一种对抗训练算法，在每个小批量中注入对抗样本，并设定可控的损失加权参数 lambda。
对每个样本使用随机化的 epsilon，以避免过拟合到固定扰动大小。
使用批量归一化以及包含干净与对抗样本的混合小批量，以实现稳定的大规模训练。
在 ImageNet 上使用 Inception v3 进行评估，采用 RMSProp 与跨 50 台机器的同步分布式训练。

实验结果

研究问题

RQ1对抗训练如何扩展到如 ImageNet 这样的大模型和大数据集？
RQ2使用一阶攻击的对抗训练是否对其他一阶攻击以及某些多步攻击提供鲁棒性？
RQ3在有无对抗训练的情况下，模型容量如何影响对抗鲁棒性？
RQ4对抗样例在模型之间的可迁移性如何，攻击类型如何影响它？
RQ5对抗训练中是否存在标签泄露现象，应该如何构建攻击以进行鲁棒评估？

主要发现

使用一阶方法的对抗训练提高对这类一阶攻击的鲁棒性，在对抗样本上可达到约 74% 的 top-1 准确率，同时干净数据准确率约下降 0.8%。
当与对抗训练结合时，增加模型容量（更深或更宽）可以提升鲁棒性。
迭代对抗样例在很大程度上对仅用一阶对抗训练获得的鲁棒性不敏感，表明对多步攻击的跨保护性有限。
对 FGSM 风格对抗样例的可迁移性更高，而迭代的一步法迁移性较低，暗示对黑盒攻击存在潜在的安全收益。
在仅用真正标签构建一阶对抗样本时观察到标签泄露效应，导致对抗样本的准确率高于干净样本；当不使用真标签或使用迭代方法时，该效应消失。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。