QUICK REVIEW

[论文解读] Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Nicolas Papernot, Patrick McDaniel|arXiv (Cornell University)|May 24, 2016

Adversarial Robustness in Machine Learning参考文献 16被引用 1,415

一句话总结

本文研究跨多种机器学习模型的对抗样本转移性，并通过通过查询 oracle 和数据集增强训练替代模型来对真实服务实施实际的黑盒攻击。

ABSTRACT

Many machine learning models are vulnerable to adversarial examples: inputs that are specially crafted to cause a machine learning model to produce an incorrect output. Adversarial examples that affect one model often affect another model, even if the two models have different architectures or were trained on different training sets, so long as both models were trained to perform the same task. An attacker may therefore train their own substitute model, craft adversarial examples against the substitute, and transfer them to a victim model, with very little information about the victim. Recent work has further developed a technique that uses the victim model as an oracle to label a synthetic training set for the substitute, so the attacker need not even collect a training set to mount the attack. We extend these recent techniques using reservoir sampling to greatly enhance the efficiency of the training procedure for the substitute model. We introduce new transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees. We demonstrate our attacks on two commercial machine learning classification systems from Amazon (96.19% misclassification rate) and Google (88.94%) using only 800 queries of the victim model, thereby showing that existing machine learning approaches are in general vulnerable to systematic black-box attacks regardless of their structure.

研究动机与目标

动机并刻画跨多种 ML 模型类别的对抗样本转移性。
在 MNIST 上评估同一技术内和跨技术的转移性，覆盖多样的模型。
开发替代模型学习技术，在拥有 oracle 访问权限的条件下实现黑盒攻击。
在有限查询条件下演示针对商业分类器的实际黑盒攻击。

提出的方法

定义同一技术内和跨技术的转移性并通过实验量化。
在 MNIST 上针对每种技术训练多种模型（DNN、LR、SVM、DT、kNN），并构造对抗样本。
将转移率测量为在其他模型上的错分比例。
通过基于雅可比矩阵的数据集增强并结合改进（周期性步长、reservoir sampling）扩展替代模型学习。
使用用有限查询训练的替代模型对 Amazon 和 Google 的分类器实现黑盒攻击。

实验结果

研究问题

RQ1在常见的机器学习技术中，同一技术内和跨技术的对抗样本转移是否具有鲁棒性？
RQ2通过 oracle 查询学习的替代模型是否能够有效实现对未知目标分类器的黑盒攻击？
RQ3在有限查询和非深度模型目标下，针对商业分类器的实际黑盒攻击是否可行？

主要发现

对抗样本在同一技术内转移效果良好（如 LR 转移率>94%），并在跨技术的情况下对若干模型对也有效。
跨技术转移性强但异质性显著：DT 显示出最高的易受攻击性（47.20%–89.29%），而 DNN 较为鲁棒（0.82%–38.27%）。
替代模型（DNN、LR、SVM、DT、kNN）在经过迭代增强后，能够达到对 MNIST 测试数据与目标标签一致性的 77%–83% 准确度，随 oracle 而异。
周期性步长和 reservoir sampling 能显著提升替代标签的一致性并减少 oracle 查询次数。
使用仅 800 次查询的逻辑回归替代模型对 Amazon 和 Google 的分类器进行的黑盒攻击分别误分类了 96.19% 和 88.94% 的输入。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。