QUICK REVIEW

[論文レビュー] Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Nicolas Papernot, Patrick McDaniel|arXiv (Cornell University)|May 24, 2016

Adversarial Robustness in Machine Learning参考文献 16被引用数 1,415

ひとこと要約

本稿は多様な機械学習モデルに跨る敵対的サンプルの転移性を研究し、オラクル問合せとデータセット拡張を通じて代替モデルを訓練することで、実サービスに対する実用的なブラックボックス攻撃を実証する。

ABSTRACT

Many machine learning models are vulnerable to adversarial examples: inputs that are specially crafted to cause a machine learning model to produce an incorrect output. Adversarial examples that affect one model often affect another model, even if the two models have different architectures or were trained on different training sets, so long as both models were trained to perform the same task. An attacker may therefore train their own substitute model, craft adversarial examples against the substitute, and transfer them to a victim model, with very little information about the victim. Recent work has further developed a technique that uses the victim model as an oracle to label a synthetic training set for the substitute, so the attacker need not even collect a training set to mount the attack. We extend these recent techniques using reservoir sampling to greatly enhance the efficiency of the training procedure for the substitute model. We introduce new transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees. We demonstrate our attacks on two commercial machine learning classification systems from Amazon (96.19% misclassification rate) and Google (88.94%) using only 800 queries of the victim model, thereby showing that existing machine learning approaches are in general vulnerable to systematic black-box attacks regardless of their structure.

研究の動機と目的

複数の機械学習モデルクラス全体にわたり、敵対的サンプルの転移性を動機づけ、特徴づける。
MNIST上で多様なモデルに対して、同一手法内および他手法間の転移性を評価する。
オラクルアクセスの下でブラックボックス攻撃を実現するための代替モデル学習技術を開発する。
限定クエリで商用分類器に対する実用的なブラックボックス攻撃を実証する。

提案手法

同一手法内の転移性と他手法間の転移性を定義し、実験的に定量化する。
MNIST上で技法ごとに複数のモデルを訓練（DNN、LR、SVM、DT、kNN）し、敵対的サンプルを作成する。
他のモデルでの誤分類割合として転移率を測定する。
ヤコビアンに基づくデータセット拡張を用いた代替モデル学習を、周期的なステップサイズとリザーバサンプリングといった改良を加えて拡張する。
限定クエリで訓練した代替モデルを用い、AmazonおよびGoogleの分類器に対するブラックボックス攻撃を示す。

実験結果

リサーチクエスチョン

RQ1同一手法内および他手法間の敵対的サンプル転移は、一般的なML技術間で頑健に成り立つのか？
RQ2オラクル問合せで学習した代替モデルは、未知のターゲット分類器に対して効果的なブラックボックス攻撃を可能にするか？
RQ3限定クエリと非深層ターゲットを前提とした商用分類器に対する実用的なブラックボックス攻撃は実現可能か？

主な発見

敵対的サンプルは、同一手法内（例：LRの転移率 > 94%）および他手法間（クロス手法）でも、いくつかのモデルペアで良好に転移する。
他手法間転移は強力だが不均一である：DTは最も脆弱性を示し（47.20%–89.29%）、DNNは比較的堅牢（0.82%–38.27%）」。
代替モデル（DNN、LR、SVM、DT、kNN）は、反復的な拡張の後、MNISTテストデータのターゲットラベルを77%–83%の精度で一致させることができ、オラクルによって異なる。
周期的ステップサイズとリザーバサンプリングは、代替ラベルの一致を著しく改善し、オラクル問合せ回数を削減する。
AmazonおよびGoogleの分類器に対するブラックボックス攻撃は、ロジスティック回帰の代替モデルを用い、800回の問合せのみで、それぞれ96.19%、88.94%の入力を誤分類させた。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。