QUICK REVIEW

[論文レビュー] On the Limitation of Local Intrinsic Dimensionality for Characterizing the Subspaces of Adversarial Examples.

Pei Hsuan Lu, Pin Yu Chen|arXiv (Cornell University)|Feb 12, 2018

Adversarial Robustness in Machine Learning被引用数 4

ひとこと要約

この論文は、深層ニューラルネットワークの活性化における adversarial subspaces を特徴付ける際のローカル内挿次元（LID）の限界を調査する。MNIST および CIFAR-10 を用いて、さまざまな信頼度のオーバーオブリビアス攻撃およびブラックボックス転送攻撃において、LID が adversarial subspaces を信頼性高く捉えられないことを示し、耐性分析における LID の有用性に顕著な制限があることを明らかにする。

ABSTRACT

Understanding and characterizing the subspaces of adversarial examples aid in studying the robustness of deep neural networks (DNNs) to adversarial perturbations. Very recently, Ma et al. (ICLR 2018) proposed to use local intrinsic dimensionality (LID) in layer-wise hidden representations of DNNs to study adversarial subspaces. It was demonstrated that LID can be used to characterize the adversarial subspaces associated with different attack methods, e.g., the Carlini and Wagner's (C&W) attack and the fast gradient sign attack. In this paper, we use MNIST and CIFAR-10 to conduct two new sets of experiments that are absent in existing LID analysis and report the limitation of LID in characterizing the corresponding adversarial subspaces, which are (i) oblivious attacks and LID analysis using adversarial examples with different confidence levels; and (ii) black-box transfer attacks. For (i), we find that the performance of LID is very sensitive to the confidence parameter deployed by an attack, and the LID learned from ensembles of adversarial examples with varying confidence levels surprisingly gives poor performance. For (ii), we find that when adversarial examples are crafted from another DNN model, LID is ineffective in characterizing their adversarial subspaces. These two findings together suggest the limited capability of LID in characterizing the subspaces of adversarial examples.

研究の動機と目的

さまざまな攻撃タイプにおいて、ローカル内挿次元（LID）が adversarial subspaces を特徴付ける有効性を評価すること。
adversarial 攻撃における信頼度の変動が、LID が潜在する部分空間を検出する能力に与える影響を調査すること。
異なるモデルから生成された adversarial 例（ブラックボックス転送攻撃）に LID を適用した際の性能を評価すること。
深層ニューラルネットワークの表現における adversarial 例の幾何構造を分析するための LID の固有の限界を特定すること。
LID が深層ニューラルネットワーク表現における adversarial subspaces の内挿次元を信頼性高く捉えるという仮定に反する実証的証拠を提供すること。

提案手法

深層ニューラルネットワークのレイヤーワイズ隠れ表現を用いて、MNIST および CIFAR-10 データセット上で実験を実施する。
オーバーオブリビアス攻撃におけるさまざまな信頼度で生成された adversarial 例に LID 評価を適用する。
異なる信頼度を持つ adversarial 例のアンサンブルに基づく LID 評価を用いて、測定の頑健性を評価する。
別の事前学習済みモデルから生成された adversarial 例（ブラックボックス転送攻撃）に対して LID を評価する。
クリーンな例、adversarial 例、およびそれらの部分空間の間で LID 値を比較し、幾何的特徴付け能力を評価する。
信頼度パラメータやモデル転送性といったハイパーパrameter に対する LID の感度を分析する。

実験結果

リサーチクエスチョン

RQ1adversarial 攻撃の信頼度が、LID がその結果として得られる adversarial subspace を特徴付ける能力にどのように影響するか？
RQ2複数の信頼度を持つ adversarial 例のアンサンブルに対して LID が信頼性高く adversarial subspace を同定できるか？
RQ3adversarial 例が別のモデルから生成された場合（ブラックボックス設定）、LID は adversarial subspace を特徴付けるのに有効であるか？
RQ4転送可能な攻撃下で、LID は adversarial subspaces の真の幾何的構造をどの程度反映しているか？
RQ5深層ニューラルネットワークにおける adversarial robustness の診断ツールとしての LID の限界は何か？

主な発見

LID の性能は、adversarial 攻撃で用いられる信頼度パラメータに極めて敏感であり、信頼度が変動するにつれて著しく低下する。
異なる信頼度を持つ adversarial 例のアンサンブルに基づく LID 評価は性能が著しく低く、部分空間検出における不安定性を示している。
adversarial 例が別のモデルから生成された場合（ブラックボックス転送）、LID は対応する adversarial subspace を効果的に特徴付けられない。
オーバーオブリビアス攻撃と変動する信頼度、およびブラックボックス転送攻撃の組み合わせが、LID が異なる adversarial 例タイプに一般化する能力に根本的な限界を露呈する。
これらの発見は、LID が深層ニューラルネットワークにおける adversarial subspaces の内挿幾何を特徴付ける信頼性あるまたは頑健な測定法ではないことを総合的に示唆する。
これらの結果は、LID が DNN の表現における adversarial 例の構造を調査するための普遍的ツールであるという仮定を覆すものである。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。