QUICK REVIEW

[论文解读] Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN

Hyeungill Lee, Sungyeob Han|arXiv (Cornell University)|May 9, 2017

Adversarial Robustness in Machine Learning参考文献 17被引用 87

一句话总结

本文提出 Generative Adversarial Trainer (GAT)，一种基于 GAN 的框架，交替训练扰动生成器和分类器，以提高对对抗样本的鲁棒性，并作为监督学习的强正则化。

ABSTRACT

We propose a novel technique to make neural network robust to adversarial examples using a generative adversarial network. We alternately train both classifier and generator networks. The generator network generates an adversarial perturbation that can easily fool the classifier network by using a gradient of each image. Simultaneously, the classifier network is trained to classify correctly both original and adversarial images generated by the generator. These procedures help the classifier network to become more robust to adversarial perturbations. Furthermore, our adversarial training framework efficiently reduces overfitting and outperforms other regularization methods such as Dropout. We applied our method to supervised learning for CIFAR datasets, and experimantal results show that our method significantly lowers the generalization error of the network. To the best of our knowledge, this is the first method which uses GAN to improve supervised learning.

研究动机与目标

在传统方法之外，激发对抗样例的鲁棒分类能力。
提出一种基于 GAN 的对抗训练框架，使生成器学习扰动以欺骗分类器。
与标准对抗训练与 dropout 相比，展示更优的鲁棒性与正则化效果。
Show that GAT reduces generalization error on CIFAR-10 and CIFAR-100 datasets.

提出的方法

引入一个分类器 F(x; θ_f) 和一个输出对 x 进行扰动的生成器 G(Δ; θ_g)。
定义生成器损失 L_G(Δ,y) = F(x + G(Δ))_y + c_g * ||G(Δ)||_2^2 以构造对抗性扰动。
定义分类器对抗目标 L_F = α J(θ_f, x, y) + (1−α) J(θ_f, x + G(Δ), y) ，实验中 α = 0.5。
交替训练：先更新 G 以最大化欺骗能力同时控制扰动强度，再更新 F 以正确分类原始图像和扰动图像。
使用 Adam 优化器以及交替更新（每次分类器步 k = 1 次生成器步）。
应用于 CIFAR-10/100，采用一个小型 All-CNN 风格的分类器和一个 6 层的 G 用于扰动（无 batch norm/ dropout）。

实验结果

研究问题

RQ1基于 GAN 的扰动生成器在给定扰动预算下，是否能生成比快速梯度法更强的对抗样本？
RQ2与传统 FG 为基础的对抗训练和 dropout 相比，采用生成器的对抗训练是否提高分类器的鲁棒性和泛化能力？
RQ3GAT 框架是否可以作为跨不同网络结构和数据集的正则化工具？
RQ4自适应、数据相关的扰动对训练动态和正则化有效性有何影响？

主要发现

方法	测试准确率（%）
基线	77.48 ± 0.46
Dropout	78.49 ± 0.64
随机扰动	77.59 ± 0.57
对抗训练（FG, L_infty）	78.12 ± 0.59
对抗训练（FG, L2）	77.99 ± 0.45
对抗训练（GAT）	80.33 ± 0.44
Dropout + GAT	81.62 ± 0.34
基线	44.32 ± 0.63
Dropout	46.29 ± 0.61
随机扰动	44.43 ± 0.71
对抗训练（FG, L_infty）	45.16 ± 0.73
对抗训练（FG, L2）	45.67 ± 0.63
对抗训练（GAT）	50.44 ± 0.56
Dropout + GAT	50.71 ± 0.49

GAT 在低扰动功率下可以产生比快速梯度法更强的对抗扰动。
在 CIFAR-10/100 上，采用 GAT 的对抗训练比基线、Dropout、随机扰动和基于 FG 的对抗训练具有更高的测试准确率。
GAT 提供显著的正则化增益，将 CIFAR-10 的基线从 77.48% 提升到 80.33%，将 CIFAR-100 的基线从 44.32% 提升到 50.44%。
将 Dropout 与 GAT 结合可进一步提升性能（CIFAR-10 为 81.62%，CIFAR-100 为 50.71%）。
GAT 的鲁棒性在直接攻击和间接攻击下均明显优于 FG 基方法，在多种 ε 设置下表现更好。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。