QUICK REVIEW

[论文解读] Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks

Aamir Mustafa, Salman Khan|arXiv (Cornell University)|Apr 1, 2019

Adversarial Robustness in Machine Learning参考文献 41被引用 28

一句话总结

本文提出一种主动防御方法，通过凸多面体约束强制中间特征表示在类别间解耦，确保类别特定特征流形之间的最大分离，从而增强深度神经网络的鲁棒性。该方法在无需对抗训练且无梯度混淆的情况下，实现了最先进水平的鲁棒性——在CIFAR-10和CIFAR-100上对PGD攻击的鲁棒性分别达到46.7%和36.1%。

ABSTRACT

Deep neural networks are vulnerable to adversarial attacks, which can fool them by adding minuscule perturbations to the input images. The robustness of existing defenses suffers greatly under white-box attack settings, where an adversary has full knowledge about the network and can iterate several times to find strong perturbations. We observe that the main reason for the existence of such perturbations is the close proximity of different class samples in the learned feature space. This allows model decisions to be totally changed by adding an imperceptible perturbation in the inputs. To counter this, we propose to class-wise disentangle the intermediate feature representations of deep networks. Specifically, we force the features for each class to lie inside a convex polytope that is maximally separated from the polytopes of other classes. In this manner, the network is forced to learn distinct and distant decision regions for each class. We observe that this simple constraint on the features greatly enhances the robustness of learned models, even against the strongest white-box attacks, without degrading the classification performance on clean images. We report extensive evaluations in both black-box and whitebox attack scenarios and show significant gains in comparison to state-of-the art defenses1. 1Code and and models are available at: https://github.com/ aamir-mustafa/pcl-adversarial-defense Code and and models are available at: https://github.com/ aamir-mustafa/pcl-adversarial-defense

研究动机与目标

为解决深度神经网络在白盒设置下（攻击者可完全访问模型）对对抗攻击的脆弱性问题。
探究在隐藏特征空间中强制实现几何分离是否可提升对对抗扰动的鲁棒性。
开发一种无需依赖对抗训练或梯度混淆的防御机制，以增强决策边界的鲁棒性。
在多种数据集和攻击类型（包括强迭代攻击）上验证该方法的有效性。

提出的方法

该方法强制每个类别的中间特征位于一个与其他类别多面体最大分离的凸多面体内。
引入一种多层次、深度监督的损失函数，以在神经网络的多个层级上优化特征表示。
该损失函数增强了特征空间中的类内紧凑性与类间分离性，减少了类别流形之间的重叠。
该方法通过几何约束确保对抗扰动难以将特征推过决策边界。
防御通过修改后的训练目标实现，训练过程中无需使用对抗数据。
该方法通过确保在增加扰动预算时鲁棒性趋势单调下降，并在各类攻击中保持一致性能，避免了梯度遮蔽。

实验结果

研究问题

RQ1在隐藏特征空间中强制实现几何分离是否可提升对强白盒对抗攻击的鲁棒性？
RQ2对特征流形施加基于多面体的约束是否能有效防止对抗扰动跨越决策边界？
RQ3在黑盒与白盒攻击场景下，该方法与最先进防御方法相比表现如何？
RQ4该防御是否存在梯度混淆问题，这是以往防御方法的常见缺陷？
RQ5该方法能否在不进行对抗训练的前提下，同时保持高干净准确率并实现卓越的鲁棒性？

主要发现

所提出的防御方法在CIFAR-10上对ϵ = 0.03的PGD攻击实现了46.7%的鲁棒准确率，显著优于先前最先进防御方法。
在CIFAR-100上，该方法在相同PGD攻击下实现了36.1%的鲁棒准确率，表明其在不同数据集间具有强大的泛化能力。
该模型在不进行对抗训练的情况下，仍保持了高干净准确率（CIFAR-10上为90.8%），同时实现了优异的鲁棒性。
该防御在所有攻击类型中均表现出一致的鲁棒性，包括PGD、BIM、MIM和C&W等迭代方法，且在增加扰动预算时无性能下降。
该方法未表现出梯度遮蔽，表现为鲁棒性随ϵ增加而单调衰减，且在各类攻击中性能稳定。
实证结果表明，该方法在倒数第二层的特征表示在特征空间中具有良好的分离性，因而对对抗扰动具有较强抵抗力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。