QUICK REVIEW

[論文レビュー] SentiNet: Detecting Physical Attacks Against Deep Learning Systems

Edward Chou, Florian Tramèr|arXiv (Cornell University)|Dec 4, 2018

Adversarial Robustness in Machine Learning参考文献 34被引用数 84

ひとこと要約

SentiNetは、攻撃の種別に依存しない検出フレームワークであり、モデルの解釈可能性と物体検出を活用して、物理的パッチやデータ汚染などの局所的で普遍的な adversarial 攻撃を、攻撃の事前知識や再訓練を必要とせずに特定する。さまざまな攻撃タイプに強く、検出を回避するためにパッチを設計する能動的攻撃者に対しても耐性を示す。

ABSTRACT

SentiNet is a novel detection framework for localized universal attacks on neural networks. These attacks restrict adversarial noise to contiguous portions of an image and are reusable with different images -- constraints that prove useful for generating physically-realizable attacks. Unlike most other works on adversarial detection, SentiNet does not require training a model or preknowledge of an attack prior to detection. Our approach is appealing due to the large number of possible mechanisms and attack-vectors that an attack-specific defense would have to consider. By leveraging the neural network's susceptibility to attacks and by using techniques from model interpretability and object detection as detection mechanisms, SentiNet turns a weakness of a model into a strength. We demonstrate the effectiveness of SentiNet on three different attacks -- i.e., data poisoning attacks, trojaned networks, and adversarial patches (including physically realizable attacks) -- and show that our defense is able to achieve very competitive performance metrics for all three threats. Finally, we show that SentiNet is robust against strong adaptive adversaries, who build adversarial patches that specifically target the components of SentiNet's architecture.

研究の動機と目的

深層ニューラルネットワークにおける物理的に実現可能で局所的な普遍的 adversarial 攻撃を検出する課題に対処すること。
攻撃の事前知識やモデルの再訓練を必要としない防御メカニズムを開発すること。
検出を回避するために意図的に攻撃を設計する能動的攻撃者に対して耐性を持つ検出フレームワークを作成すること。
adversarial パッチ、データ汚染、トロイの木馬が埋め込まれたモデルを含む多様な攻撃タイプに一般化できること。

提案手法

SentiNetは、モデル予測に最も影響を与える入力画像の顕著な領域を特定するために、クラス活性化マッピング（CAM）を用いる。
不審な高活性領域（adversarial パerturbationに対応する可能性がある）を局所化するために、物体検出技術を適用する。
フレームワークは、ニューラルネットワークの注目メカニズムを、潜在的な攻撃の兆候として扱い、モデルの脆弱性を検出信号に変換する。
解釈可能性マップと物体検出を組み合わせることで、異なる入力に対して再利用可能な局所的 adversarial ノイズを検出する。
システムはモジュール型かつ攻撃に依存しない設計となっており、特定の攻撃パターンやトレーニングデータに依存しない。
検出コンponentsを回避するために最適化されたパッチを備えた能動的攻撃者に対して評価される。

実験結果

リサーチクエスチョン

RQ1攻撃の事前知識やモデル再訓練を必要とせずに、局所的で普遍的な adversarial 攻撃を検出フレームワークが特定できるか？
RQ2SentiNetは、異なるモデルやデータセットにおいて物理的に実現可能な adversarial パッチをどの程度効果的に検出できるか？
RQ3検出メカニズムを回避するために攻撃を設計する能動的攻撃者に対して、SentiNetはどの程度耐性を示すか？
RQ4解釈可能性に基づく検出は、データ汚染やモデルトロイの木馬など多様な攻撃タイプに一般化可能か？
RQ5検出精度と耐性の観点から、SentiNetの性能は攻撃特化型防御と比べてどの程度優れているか？

主な発見

SentiNetは、adversarial パッチ、データ汚染、トロイの木馬が埋め込まれたモデルという3つの異なる攻撃タイプにおいて、競争力のある検出性能を達成している。
攻撃を検出を回避するために最適化された物理的 adversarial パッチに対しても、SentiNetは正常に検出を実行している。
検出コンponentsを回避するために意図的にパッチを設計する強力な能動的攻撃者に対しても、SentiNetは耐性を示している。
再訓練や攻撃の事前知識を必要としないため、実世界の展開において広く適用可能で実用的である。
モデルの解釈可能性と物体検出を活用することで、SentiNetはモデルの脆弱性を検出の利点に変換している。
攻撃特化型のシグネチャーやトレーニングデータに依存せず、高い検出精度を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。