Skip to main content
QUICK REVIEW

[论文解读] Blocking Transferability of Adversarial Examples in Black-Box Learning Systems

Hossein Hosseini, Yize Chen|arXiv (Cornell University)|Mar 13, 2017
Adversarial Robustness in Machine Learning参考文献 45被引用 92
一句话总结

本文提出一种NULL标签防御方法,在黑盒学习中通过训练分类器将对抗性输入拒绝为NULL,同时在清洁数据上保持准确性,从而阻止对抗样本的可转移性。

ABSTRACT

Advances in Machine Learning (ML) have led to its adoption as an integral component in many applications, including banking, medical diagnosis, and driverless cars. To further broaden the use of ML models, cloud-based services offered by Microsoft, Amazon, Google, and others have developed ML-as-a-service tools as black-box systems. However, ML classifiers are vulnerable to adversarial examples: inputs that are maliciously modified can cause the classifier to provide adversary-desired outputs. Moreover, it is known that adversarial examples generated on one classifier are likely to cause another classifier to make the same mistake, even if the classifiers have different architectures or are trained on disjoint datasets. This property, which is known as transferability, opens up the possibility of attacking black-box systems by generating adversarial examples on a substitute classifier and transferring the examples to the target classifier. Therefore, the key to protect black-box learning systems against the adversarial examples is to block their transferability. To this end, we propose a training method that, as the input is more perturbed, the classifier smoothly outputs lower confidence on the original label and instead predicts that the input is "invalid". In essence, we augment the output class set with a NULL label and train the classifier to reject the adversarial examples by classifying them as NULL. In experiments, we apply a wide range of attacks based on adversarial examples on the black-box systems. We show that a classifier trained with the proposed method effectively resists against the adversarial examples, while maintaining the accuracy on clean data.

研究动机与目标

  • 激发并形式化黑盒学习系统中对抗样本可转移性的威胁。
  • 提出一种防御,在分类器中增加一个NULL标签以拒绝对抗性输入。
  • 在不同攻击设置和平台下,展示NULL标签方法在MNIST和GTSRB上的鲁棒性。
  • 评估鲁棒性训练和NULL标签在清洁数据与对抗性输入上的准确度影响。

提出的方法

  • 为对抗者定义黑盒和盲目威胁模型。
  • 通过在清洁数据和对抗样例之间交替训练替代分类器,并向oracle查询标签。
  • 在输出中增加NULL标签并使用标签平滑来帮助泛化。
  • 在验证数据上使用Misclassification Attack Greedy (MG)方法计算对抗样例的NULL概率。
  • 应用基于梯度的平滑定向攻击(STG)用于带有对抗特征的对抗训练。
  • 评估在DNN、鲁棒DNN以及ML即服务平台(AWS/Amazon与Microsoft Azure)上的可转移性与鲁棒性。

实验结果

研究问题

  • RQ1对抗样本是否可以在替代分类器上构造并转移到目标黑盒分类器?
  • RQ2在分类器中增加NULL标签并对对抗样本进行训练,是否能在保持清洁数据准确性的同时降低转移性?
  • RQ3不同威胁模型(黑盒 vs. 盲目)如何影响攻击成功率和转移性?
  • RQ4NULL标签对实际机器学习服务(AWS/Amazon、Microsoft Azure)以及公开数据集MNIST和GTSRB有何影响?
  • RQ5鲁棒训练变体在缓解对抗样本可转移性方面与NULL标签防御相比有何差异?

主要发现

  • 对抗样本的可转移性使对黑盒系统的攻击更有效,扰动越大,成功率越高。
  • NULL标签结合标签平滑可以将高扰动样本映射到NULL类,从而拒绝对抗性输入,同时保留清洁数据的准确性。
  • 在MNIST上,对NULL标签分类器的对抗样本可转移性基本为零。
  • 在GTSRB上,L0攻击的可转移性接近零,L∞攻击低于10%,NULL标签模型显著降低成功对抗转移。
  • 鲁棒训练变体在不同数据集上的清洁数据准确率呈现混合效果但通常提供一定的抗性,而在所测试的基准上,NULL标签方法优于标准对抗训练。
  • 准确率结果在MNIST和GTSRB上指示以下确切数值:DNN 99.35% / 97.77%;Robust0 98.81% / 97.05%;Robust∞ 99.39% / 96.80%;AWS Amazon 92.00% / 72.81%;Microsoft 97.73% / 85.76%。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。