QUICK REVIEW

[論文レビュー] Constructing Unrestricted Adversarial Examples with Generative Models

Yang Song, Rui Shu|arXiv (Cornell University)|May 21, 2018

Adversarial Robustness in Machine Learning参考文献 49被引用数 125

ひとこと要約

この論文は、条件生成モデル（AC-GAN）を用いてゼロから合成した制限のない敵対的サンプルを導入し、それらが認証済み防御や敵対的訓練を回避できる一方で人間には正当性を保つことを示している。

ABSTRACT

Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack. In this paper, we propose unrestricted adversarial examples, a new threat model where the attackers are not restricted to small norm-bounded perturbations. Different from perturbation-based attacks, we propose to synthesize unrestricted adversarial examples entirely from scratch using conditional generative models. Specifically, we first train an Auxiliary Classifier Generative Adversarial Network (AC-GAN) to model the class-conditional distribution over data samples. Then, conditioned on a desired class, we search over the AC-GAN latent space to find images that are likely under the generative model and are misclassified by a target classifier. We demonstrate through human evaluation that unrestricted adversarial examples generated this way are legitimate and belong to the desired class. Our empirical results on the MNIST, SVHN, and CelebA datasets show that unrestricted adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.

研究の動機と目的

「既存データの小さな摂動に限定されない敵対的入力」という脅威モデルを動機付ける。
クラス条件付き生成モデルを用いてスクラッチから制限のない敵対的サンプルを生成する実用的な方法を提案する。
複数のデータセットにわたり、認証済み防御および敵対的訓練モデルに対する攻撃の有効性を評価する。

提案手法

Auxiliary Classifier GAN (AC-GAN) を訓練してクラス条件付き画像分布をモデル化する。
AC-GAN の潜在空間を探索して、モデル下で可能性が高く、ターゲット分類器で誤分類される画像を見つける。
損失 L = L0 + λ1 L1 + λ2 L2 を最適化して高忠実度の制限のない敵対的サンプルを生成する。ここで L0 は分類器を標的とし、L1 は潜在コードを正則化し、L2 は auxiliary classifier を介して元のクラスと整合させる。
生成画像に対して小さな学習可能なノイズを付加して多様性を高める（ノイズ強化攻撃）。
Synthesis された画像が意図したクラスに属し、人間にも正当であることを検証するために Amazon Mechanical Turk を用いる。

実験結果

リサーチクエスチョン

RQ1制限のない敵対的サンプル――ゼロから合成された画像――は、強力な防御にもかかわらず分類器を誤らせることができるのか。
RQ2AC-GAN ベースの制限のない敵対的サンプルは、MNIST、SVHN、CelebA に対して認証済み防御および敵対的訓練モデルに対してどの程度効果的か。
RQ3制限のない敵対的サンプルはブラックボックス設定で他のアーキテクチャへ転移するのか。
RQ4Generator にノイズを追加することは攻撃の有効性や現実感を改善するのか。

主な発見

制限のない敵対的サンプルは、MNIST、SVHN、CelebA の各データセットでターゲット分類器を欺く成功率が高く、84%を超える。
摂動ベースの攻撃向けに設計された認証済み防御を回避し、敵対的訓練モデルにも脅威を与える。
MTurk による人間評価は、多くの制限のない敵対的サンプルが対象クラスに属する正当な画像であることを確認している。
ノイズ強化型の変種は一部のモデルへの転移性を改善できる一方、他のモデルには不確定な影響を及ぼす。
制限のない敵対的サンプルは他のアーキテクチャへの転移性を中程度に示し、ブラックボックスリスクの可能性を示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。