QUICK REVIEW

[論文レビュー] SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency

Junfeng Guo, Yiming Li|arXiv (Cornell University)|Feb 7, 2023

Adversarial Robustness in Machine Learning被引用数 19

ひとこと要約

SCALE-UPはデータなしおよびデータ制限のシナリオ下で、拡大入力のスケールされた予測整合性（SPC）を測定することにより、ブラックボックスMLaaS設定でバックドア入力を検出します。理論的裏付けと強力な実証結果を伴います。

ABSTRACT

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries embed a hidden backdoor trigger during the training process for malicious prediction manipulation. These attacks pose great threats to the applications of DNNs under the real-world machine learning as a service (MLaaS) setting, where the deployed model is fully black-box while the users can only query and obtain its predictions. Currently, there are many existing defenses to reduce backdoor threats. However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection, called SCALE-UP, which requires only the predicted labels to alleviate this problem. Specifically, we identify and filter malicious testing samples by analyzing their prediction consistency during the pixel-wise amplification process. Our defense is motivated by an intriguing observation (dubbed scaled prediction consistency) that the predictions of poisoned samples are significantly more consistent compared to those of benign ones when amplifying all pixel values. Besides, we also provide theoretical foundations to explain this phenomenon. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our defense and its resistance to potential adaptive attacks. Our codes are available at https://github.com/JunfengGo/SCALE-UP.

研究の動機と目的

ピクセル値が増幅されたとき、毒されたサンプルと benign サンプルを区別する予測整合性現象（スケールされた予測整合性）を明らかにする。
スケールされた予測整合性の理論的説明を提供。
データなしおよびデータ制限設定で使えるブラックボックス入力レベルのバックドア検出器としてSCALE-UPを提案。
広範な実験を通じて有効性と効率を示し、適応的攻撃に対する耐性を評価。

提案手法

攻撃されたモデルの下で、 benignと poisoned inputs の予測に対するピクセル単位の増幅とその影響を調査。
スケールされた予測整合性（SPC）を、スケールされた画像の予測ラベルが元の入力のラベルと一致する割合として定義。
事前定義されたスケーリングセットを用いて疑わしい入力のSPCを計算し、閾値に基づいて分類するデータなしSCALE-UPを開発。
小規模な benign サンプル集合から得られたクラス別平均と標準偏差でSPCを正規化してクラス差の影響を軽減するデータ制限SCALE-UPへ拡張。
スケールされた予測整合性を正当化するためのNTK（ニューラル接線核）にインスパイアされた分析による理論的裏付けを提供。
CIFAR-10およびTiny ImageNet上で6つの代表的なバックドア攻撃を評価し、他のブラックボックス防御法と比較。

実験結果

リサーチクエスチョン

RQ1ブラックボックス設定でピクセル単位の増幅に対する予測の挙動で、毒されたサンプルと benign サンプルを区別できるか。
RQ2スケールされた予測整合性は、モデルアクセスなしでバックドアを検出するための頑健でデータ効率の良い信号を提供するか。
RQ3効率と精度を維持しつつ、データなしおよびデータ制限シナリオにSCALE-UPをどのように適応させることができるか。
RQ4高度な適応的バックドア戦略はSPCベースの検出を回避できるか。

主な発見

Attack	STRIP	ShrinkPad	DeepSweep	Frequency	Ours (data-free)	Ours (data-limited)	Average
BadNets	0.989	0.951	0.967	0.891	0.971	0.971	0.895
Label-Consistent	0.941	0.957	0.921	0.889	0.947	0.954	0.915
PhysicalBA	0.971	0.631	0.946	0.881	0.969	0.970	0.896
TUAP	0.671	0.869	0.743	0.851	0.816	0.830	0.792
WaNet	0.475	0.531	0.506	0.461	0.918	0.925	0.672
ISSBA	0.498	0.513	0.729	0.497	0.945	0.945	0.614
Average	0.8??	0.733??	0.83??	0.657??	0.918??	0.945??	N/A

攻撃されたモデル上で、毒されたサンプルはピクセル単位の増幅に対して benign サンプルより予測が安定している（スケールされた予測整合性）。
SCALE-UPは複数の攻撃とデータセットで高いAUROCを達成し、確率ベクトルを必要とする手法を含むいくつかのベースラインを上回る。
データなしSCALE-UPは防御側が選択した閾値で悪意ある入力を識別し、データ制限SCALE-UPはクラス別 benign 統計を用いてSPCを正規化して精度を向上。
SCALE-Upはパッチ型と非パッチ型バックドアの両方に有効であり、適応型攻撃に対する耐性を示す（強い正則化適応攻撃を除く。小さなランダムノイズを加えることで緩和可能）。
推論時間のオーバーヘッドは控えめで、SCALE-UPは多くのベースラインより速く、標準推論よりわずかに遅い程度。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。