QUICK REVIEW

[論文レビュー] Black-box Adversarial Attacks with Limited Queries and Information

Andrew Ilyas, Logan Engstrom|arXiv (Cornell University)|Apr 23, 2018

Adversarial Robustness in Machine Learning参考文献 30被引用数 326

ひとこと要約

この論文は現実的な3つのブラックボックス脅威モデル—クエリ制限、部分情報、ラベルのみ—を定義し、これらの制約下でターゲット指向の敵対的例を信頼性高く生成するクエリ効率の高い攻撃を提示しています。Google Cloud Vision APIに対するターゲット攻撃を含む。

ABSTRACT

Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model where the adversary can observe the full output of the network on arbitrarily many chosen inputs. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.

研究の動機と目的

現実世界のシステムを反映した現実的なブラックボックス脅威モデルの動機づけと形式化：クエリ制限、部分情報、ラベルのみアクセス。
これらの制限アクセスシナリオ下で動作する効率的な攻撃アルゴリズムの開発と分析。
大規模データセット（ImageNet）および実世界システム（Google Cloud Vision API）での攻撃の有効性を示す。

提案手法

クエリ制限設定でターゲット敵対的例の勾配を限られたクエリで推定するためにNatural Evolutionary Strategies (NES)を適用。
推定勾配を用いた射影ベースのPGD（ε-ボール制約付き）で、ターゲット perturbationを作成。
部分情報攻撃を、ターゲットクラスの画像から始め、元の画像とのブレンドとトップ-k制約の下でターゲットクラス確率を最大化を交互に行う。
ラベルのみ設定へ拡張するため、ランキングとランダム摂動に対するロバスト性に基づく proxy スコアを導入し、トップ-kラベルのみでターゲット攻撃を可能にする。
各脅威モデルに対する具体的アルゴリズムを提供し、再現性のためのソースコードを公開。

Figure 1: An illustration of the derivation of the proxy score $\hat{S}$ in the label-only setting.

実験結果

リサーチクエスチョン

RQ1攻撃者が分類器へアクセスするクエリが限られている場合、ターゲット指向の敵対的例を効率的に生成できるか？
RQ2部分情報およびラベルのみの制約下で、実世界の大規模分類器において攻撃は高い成功率を維持できるか？
RQ3この制限された脅威モデルの下で、Google Cloud Visionのような商用APIを攻撃することは現実的か？
RQ4提案された攻撃は、クエリ効率と成功率の点で従来のブラックボックス手法とどのように比較されるか？

主な発見

クエリ制限付き攻撃はNESベースの勾配推定を用いて、従来の勾配推定手法よりはるかに少ないクエリで高いターゲット成功率を達成（例：2–3桁のオーダーで効率的）。
部分情報攻撃は top-k の確率のみが利用可能な場合でもターゲット指向の敵対的例を信頼性高く生み出し、ImageNet で実用的なクエリ数で高い成功を達成。
ラベルのみ攻撃はスコアが利用できない場合でも成功し、プロキシ堅牢性スコアとランキング情報を用いて最適化を指示。
Google Cloud Vision API は部分情報設定で成功裏に攻撃され、提案手法の実世界適用性を示す。
1000サンプルで ε = 0.05、クエリ制限モデルは成功率99.2%、部分情報モデルは93.6%で約49,624クエリ、ラベルのみモデルは90%で約270万クエリ。

Figure 2: The distribution of the number of queries required for the query-limited (top) and partial-information with $k=1$ (bottom) attacks.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。