QUICK REVIEW

[論文レビュー] Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

Di Tang, Xiaofeng Wang|arXiv (Cornell University)|Aug 2, 2019

Adversarial Robustness in Machine Learning参考文献 41被引用数 44

ひとこと要約

TaCTとSCAnを導入し、EM分解と尤度比検定を用いたグローバル表現統計によって、DNNのバックドア汚染を検出する。ソース特異的なバックドアを含む。

ABSTRACT

A security threat to deep neural networks (DNN) is backdoor contamination, in which an adversary poisons the training data of a target model to inject a Trojan so that images carrying a specific trigger will always be classified into a specific label. Prior research on this problem assumes the dominance of the trigger in an image's representation, which causes any image with the trigger to be recognized as a member in the target class. Such a trigger also exhibits unique features in the representation space and can therefore be easily separated from legitimate images. Our research, however, shows that simple target contamination can cause the representation of an attack image to be less distinguishable from that of legitimate ones, thereby evading existing defenses against the backdoor infection. In our research, we show that such a contamination attack actually subtly changes the representation distribution for the target class, which can be captured by a statistic analysis. More specifically, we leverage an EM algorithm to decompose an image into its identity part (e.g., person, traffic sign) and variation part within a class (e.g., lighting, poses). Then we analyze the distribution in each class, identifying those more likely to be characterized by a mixture model resulted from adding attack samples to the legitimate image pool. Our research shows that this new technique effectively detects data contamination attacks, including the new one we propose, and is also robust against the evasion attempts made by a knowledgeable adversary.

研究の動機と目的

既存のバックドア防御が標的汚染（TaCT）に対してなぜ失敗するのかを説明する。
全クラスにわたるグローバルな表現分布を活用した頑健な検出器（SCAn）を開発する。
TaCTが、攻撃と良性の表現を識別不能にするソース特異的バックドアを作成できることを示す。
TaCTが従来の防御を回避することを示す複数データセットでの実証評価と、SCAnが汚染を検出することを示す。

提案手法

入力をEMを用いて2つの成分として表現する：同一性（mu_t）と変動（epsilon）。
すべてのクラスにわたる表現を分解して、クラス固有の同一性ベクトルと普遍的な変動分布を推定する。
表現が汚染された混合を反映するクラスを検出するために尤度比検定を適用する。
トリガーとカバー画像を挿入して、限定的な汚染でソース特異的バックドアを作成することで TaCTを実証する。
TaCTに対する防御（Neural Cleanse、STRIP、SentiNet、Activation Clustering）を評価し、その失敗を示す。
クラス間の分布を活用するグローバル情報ベースの検出器としてSCAnを提案する。

実験結果

リサーチクエスチョン

RQ1TaCTは既存のトリガー優位性防御を回避するソース特異的バックドアを可能にするか？
RQ2全クラスを横断するグローバルな表現分析は、クラス内メソッドでは検出できない汚染を検出するか？
RQ3表現を同一性と変動に分解するEMベース手法はバックドア検出に有効か？
RQ4さまざまなバックドア構成やブラックボックス攻撃に対するSCAnの頑健性はどの程度か？
RQ5多様なデータセットに対するSCAnと既存防御の比較有効性はどの程度か？

主な発見

TaCTはカバー画像を用いて、非ソースクラスへの影響を最小化しつつ高い標的誤分類を伴うソース特異的バックドアを可能にする。
TaCT下の攻撃画像の表現は、2次元PCA投影で正常なターゲットクラス画像と区別不能になる。
既存の防御（Neural Cleanse、STRIP、SentiNet、Activation Clustering）はGTSRBおよびCIFAR-10でTaCT感染を信頼性高く検出できない。
TaCTは、限定的な汚染（例：カバー画像を含めて2.1%の汚染）で高い標的誤分類率を達成し、全体精度をほぼベースラインに保つ。
SCAnは2成分分解と普遍的変動を活用して、クラス間分布の不整合を分析することで汚染を検出する。
SCAnはクラス間のグローバル情報を利用してTaCTに対する有効性と他のブラックボックス攻撃に対する頑健性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。