QUICK REVIEW

[論文レビュー] On Detecting Adversarial Perturbations

Jan Hendrik Metzen, Tim Genewein|arXiv (Cornell University)|Feb 14, 2017

Adversarial Robustness in Machine Learning被引用数 219

ひとこと要約

この論文は、 genuine data と adversarial examples を区別するために分類器に小さな検出器サブネットを追加し、CIFAR-10 および 10-class ImageNet サブセットでの検出性が高いこと、動的 adversary 対応を含むことを示す。

ABSTRACT

Machine learning and deep learning in particular has advanced tremendously on perceptual tasks in recent years. However, it remains vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to a human. In this work, we propose to augment deep neural networks with a small "detector" subnetwork which is trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. Our method is orthogonal to prior work on addressing adversarial perturbations, which has mostly focused on making the classification network itself more robust. We show empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans. Moreover, while the detectors have been trained to detect only a specific adversary, they generalize to similar and weaker adversaries. In addition, we propose an adversarial attack that fools both the classifier and the detector and a novel training procedure for the detector that counteracts this attack.

研究の動機と目的

深層ネットワークが準知覚不能な adversarial 摂動に対して脆弱であることを動機づけ、対処する。
元データと adversarial perturbation を区別する binary detector-subnet を提案する。
訓練 adversary を超えた類似/弱い adversaries に対する検出器の一般化を示す。
動的 adversaries を調査し、それらに対して検出器を硬化させる訓練戦略を提案する。

提案手法

事前訓練済み分類器の中間層に小さな adversary 検出器サブネットを取り付ける。
訓練データは元データと訓練セットのために生成された adversarial examples のバランスの取れたデータセットで検出器を訓練。
分類器の重みを固定し、 adversary ラベルの cross-entropy 損失で検出器を訓練。
CIFAR-10 および ImageNet サブセットで検出器の配置とアーキテクチャを実験を通じて調査。
摂動生成中に分類器と検出器の目的を同時に最適化する動的 adversary 形式を導入。
適応攻撃に対して検出器を硬化させる動的 adversary 訓練を開発。

Figure 2: (Left) Illustration of detectability of different adversaries and values for $\varepsilon$ on CIFAR10. The x-axis shows the predictive accuracy of the CIFAR10 classifier on adversarial examples of the test data for different adversaries. The y-axis shows the corresponding detectability of

実験結果

リサーチクエスチョン

RQ1データ依存の adversarial 摂動は、特定の adversary で訓練された検出器によって信頼性高く検出できるか？
RQ2分類器内の検出器配置は adversarial detectability にどう影響するか？
RQ31つの adversary で訓練された検出器は、他の adversary やノルム（例：l_inf vs l2）に移行するか？
RQ4分類器と検出器の両方を適応させる動的 adversaries に対して検出器はどれくらい頑健か？
RQ5適応的で動的な攻撃に対して検出器を強化するための訓練手順は何か？

主な発見

検出器は CIFAR-10 の tested adversaries で高い検出性を達成（80% 以上）、対象 adversaries の分類器が adversarial 例で 10% 未満になると検出性が 90% 超えになることが多い。
中間ネットワーク層の配置 (AD(2)) が、ファスト/反復的 adversaries に対して一般的に最良の検出をもたらす; DeepFool variants では AD(4) が最適な場合が多い。
1つの adversary で訓練された検出器は、他の類似/より弱い adversaries に転移できる; l_inf と l2 の変種間の転移は関連する攻撃ではしばしば効果的。
動的検出器（適応攻撃に耐えるよう訓練された）は、適応強度（sigma 値）の範囲で検出性 >70% を維持。
10-class ImageNet サブセットでは、ほとんどの adversaries に対して検出性が 85% 以上を達成; 一つの反復的な l2 ケース（epsilon=400）は偶然の性能に接近し、難しい edge case を示唆。
検出器はフォールバックや安全介入（例: 人間による検証）を可能にする。

Figure 3: Transferability on CIFAR10 of detector trained for adversary with maximal distortion $\epsilon_{fit}$ when tested on the same adversary with distortion $\epsilon_{test}$ . Different plots show different adversaries. Numbers correspond to the accuracy of detector on unseen test data.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。