QUICK REVIEW

[論文レビュー] Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Yige Li, Xixiang Lyu|arXiv (Cornell University)|Oct 22, 2021

Adversarial Robustness in Machine Learning被引用数 35

ひとこと要約

ABL はバックドア poisoned データ上でクリーンモデルを訓練する。バックドアの例を早期に分離し、その後バックドア相関を忘却することで、 poisoned データ上のクリーン精度をクリーンデータで訓練した場合と同等にし、バックドア攻撃の成功率を劇的に低減します。

ABSTRACT

Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training methods can be devised to prevent the backdoor triggers being injected into the trained model in the first place. In this paper, we introduce the concept of \emph{anti-backdoor learning}, aiming to train \emph{clean} models given backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the \emph{clean} and the \emph{backdoor} portions of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data much faster than learning with clean data, and the stronger the attack the faster the model converges on backdoored data; 2) the backdoor task is tied to a specific class (the backdoor target class). Based on these two weaknesses, we propose a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training. ABL introduces a two-stage \emph{gradient ascent} mechanism for standard training to 1) help isolate backdoor examples at an early training stage, and 2) break the correlation between backdoor examples and the target class at a later training stage. Through extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks, we empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data. Code is available at \url{https://github.com/bboylyg/ABL}.

研究の動機と目的

バックドアで poisoned データから学習する問題を、バックドア分布の事前知識なしに動機付け・形式化する。
バックドア攻撃の内在的な弱点を特定する：バックドアデータはより速く学習され、ターゲットクラス依存性を持つ。
Anti-Backdoor Learning (ABL) を、バックドアを分離・忘却する二段階訓練機構として提案する。
複数のデータセットと 10 件の最先端バックドア攻撃にわたるABL の堅牢性を実証する。

提案手法

バックドア学習を、クリーンデータとバックドアデータの二重タスク学習として位置づける。
バックドアデータは学習が速く、特定のターゲットクラスに結びつくことを特定する。
早期訓練で損失を閾値 gamma の周辺に制約する局所勾配上昇 (LGA) を導入し、バックドア例を分離する。
早期訓練中の低損失に基づいてバックドア例の小さなサブセットを D_b_hat（1%）として分離する。
後期訓練で、分離した集合上のバックドア損失を最大化し、クリーンセット上の損失を最小化してバックドアを忘却するグローバル勾配上昇 (GGA) を導入する。
バックドア忘却を完了しつつクリーンデータ学習を継続するために、転換エポック T_te で LGA から GGA へ切替える。
データセットとモデルをまたがって機能する実用的な gamma 値（0.5）と分離率（1%）を提供する。

実験結果

リサーチクエスチョン

RQ1バックドア分布の事前知識なしに、バックドア poisoned データ上で直接頑健な訓練を達成できるか？
RQ2訓練中にクリーンデータとバックドアデータを区別する学習ダイナミクスは何であり、これを利用してバックドア例を分離できるか？
RQ3二段階の勾配ベースのスキーム（分離＋忘却）は、クリーン精度を維持しつつバックドア効果を除去できるか？
RQ4ABL は複数データセットで幅広いバックドア攻撃に対して、既存の防御と比較してどの程度の性能を示すか？

主な発見

ABL が訓練したバックドア poisoned データ上のモデルは、クリーンデータ上で訓練したモデルと同程度のクリーン精度を達成する。
ABL は 10 個のバックドア攻撃に対して攻撃成功率を実質的に大幅低減し、しばしばランダム付近の水準にまで落とす。
ABL は CIFAR-10、GTSRB、ImageNet のサブセットで、従来のバックドア攻撃と特徴空間のバックドア攻撃の両方に対して強い堅牢性を示す。
早期訓練でデータの 1% を分離し、後期訓練で忘却を行う組み合わせは、 poisoning 率が高い場合（ストレステストの 50–70% まで）でも効果的である。
ABL は、3 つの最先端防御（Fine-pruning、MCR、NAD）を平均して上回り、データセット間で高い CA を維持しつつ ASR を低減する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。