QUICK REVIEW

[论文解读] Robust Local Features for Improving the Generalization of Adversarial Training

Chuanbiao Song, Kun He|arXiv (Cornell University)|Sep 23, 2019

Adversarial Robustness in Machine Learning参考文献 31被引用 34

一句话总结

本论文提出 RLFAT，通过对抗性训练期间的随机块洗牌（Random Block Shuffle）学习鲁棒局部特征，并将此知识转移到普通对抗性训练中，在跨数据集上提高对抗鲁棒性和标准泛化能力。

ABSTRACT

Adversarial training has been demonstrated as one of the most effective methods for training robust models to defend against adversarial examples. However, adversarially trained models often lack adversarially robust generalization on unseen testing data. Recent works show that adversarially trained models are more biased towards global structure features. Instead, in this work, we would like to investigate the relationship between the generalization of adversarial training and the robust local features, as the robust local features generalize well for unseen shape variation. To learn the robust local features, we develop a Random Block Shuffle (RBS) transformation to break up the global structure features on normal adversarial examples. We continue to propose a new approach called Robust Local Features for Adversarial Training (RLFAT), which first learns the robust local features by adversarial training on the RBS-transformed adversarial examples, and then transfers the robust local features into the training of normal adversarial examples. To demonstrate the generality of our argument, we implement RLFAT in currently state-of-the-art adversarial training frameworks. Extensive experiments on STL-10, CIFAR-10 and CIFAR-100 show that RLFAT significantly improves both the adversarially robust generalization and the standard generalization of adversarial training. Additionally, we demonstrate that our models capture more local features of the object on the images, aligning better with human perception.

研究动机与目标

激发研究对鲁棒局部特征如何影响对抗训练中的泛化差距的兴趣。
提出一种学习鲁棒局部特征并将其转移到标准对抗训练中的方法。
展示该方法与最先进的对抗性框架（PGDAT 和 TRADES）的兼容性。
在多个数据集上同时展示对抗鲁棒性和标准准确率的经验性提升。

提出的方法

引入 Random Block Shuffle (RBS) 以破坏全局结构并在对抗训练中保留局部特征。
通过在 RBS 转换的对抗样本上训练，定义用于对抗训练的鲁棒局部特征（RLFAT）。
提出 Robust Local Feature Transfer (RLFT) 以对齐 RBS 转换与原始对抗输入之间的高层特征。
将 RLFL 和 RLFT 集成到 PGDAT 与 TRADES 变体的端到端 RLFAT 损失函数中。
提供将 RBSAT 与 RLFT 结合的端到端训练算法（Algorithm 1）。
在 STL-10、CIFAR-10、CIFAR-100 上针对白盒攻击（PGD、CW）和黑盒攻击（NAttack）进行评估。

实验结果

研究问题

RQ1在对抗训练中学习的鲁棒局部特征能否比偏向全局结构的特征对未见数据有更好的泛化？
RQ2通过 RBS 学习鲁棒局部特征并将其转移到普通对抗训练中，是否同时提升鲁棒性和标准准确率？
RQ3RLFAT 是否与现有的对抗训练框架（PGDAT 和 TRADES）及数据集规模兼容？
RQ4使用 RLFAT 训练的模型是否显示的显著性图更符合人类感知？
RQ5在分布偏移（亮度与 gamma）下，鲁棒局部特征转移对性能的影响是什么？

主要发现

防御	无攻击	PGD	CW	NA攻击
STL-10, PGDAT	67.05	30.00	31.97	34.80
STL-10, TRADES	65.24	38.99	38.35	42.07
STL-10, RLFAT_P	71.47	38.42	38.42	44.80
STL-10, RLFAT_T	72.38	43.36	39.31	48.13
CIFAR-10, PGDAT	82.96	46.19	46.41	46.67
CIFAR-10, TRADES	80.35	50.95	49.80	52.47
CIFAR-10, RLFAT_P	84.77	53.97	52.40	54.60
CIFAR-10, RLFAT_T	82.72	58.75	51.94	54.60
CIFAR-100, PGDAT	55.86	23.32	22.87	22.47
CIFAR-100, TRADES	52.13	27.26	24.66	25.13
CIFAR-100, RLFAT_P	56.70	31.99	29.04	32.53
CIFAR-100, RLFAT_T	58.96	31.63	27.54	30.86

RLFAT 在 STL-10、CIFAR-10、CIFAR-100 上持续优于 PGDAT 和 TRADES，在对抗鲁棒性和标准准确率方面均有提升。
RLFAT_T 比 TRADES 在测试数据集上拥有更高的对抗鲁棒泛化和标准泛化。
RLFAT_P 同样在鲁棒性方面优于 PGDAT，并保持更高的标准准确率。
RLFAT 模型的显著性图显示对局部特征的关注增加，更符合人类感知。
损失敏感性分析表明，在亮度和 gamma 分布偏移下，RLFAT 能带来更平滑的损失。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。