QUICK REVIEW

[논문 리뷰] Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Yige Li, Xixiang Lyu|arXiv (Cornell University)|2021. 10. 22.

Adversarial Robustness in Machine Learning인용 수 35

한 줄 요약

ABL은 백도어 오염 데이터에서 백도어 예제를 일찍 분리하고 나중에 백도어 상관관계를 잊게 하여 깨끗한 모델을 학습시키며, 오염된 데이터에서도 깨끗한 데이터로 학습한 것과 비슷한 깨끗한 정확도와 백도어 공격 성공률을 급격히 낮춘다.

ABSTRACT

Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training methods can be devised to prevent the backdoor triggers being injected into the trained model in the first place. In this paper, we introduce the concept of \emph{anti-backdoor learning}, aiming to train \emph{clean} models given backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the \emph{clean} and the \emph{backdoor} portions of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data much faster than learning with clean data, and the stronger the attack the faster the model converges on backdoored data; 2) the backdoor task is tied to a specific class (the backdoor target class). Based on these two weaknesses, we propose a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training. ABL introduces a two-stage \emph{gradient ascent} mechanism for standard training to 1) help isolate backdoor examples at an early training stage, and 2) break the correlation between backdoor examples and the target class at a later training stage. Through extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks, we empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data. Code is available at \url{https://github.com/bboylyg/ABL}.

연구 동기 및 목표

백도어 오염 데이터로부터의 학습 문제를 백도어 분포에 대한 사전 지식 없이 동기화하고 형식화한다.
백도어 공격의 고유한 취약점 식별: 백도어 데이터에서 더 빠른 학습과 대상 클래스 의존성.
백도어를 분리하고 잊게 하는 두 단계 학습 메커니즘으로 Anti-Backdoor Learning (ABL) 제안.
여러 데이터셋과 10가지 최첨단 백도어 공격에 걸친 ABL의 강건성 입증.

제안 방법

깨끗한 데이터 대 백도어 데이터에서 이중 작업 학습으로 백도어 학습 프레이밍.
백도어 데이터가 더 빨리 학습되고 특정 대상 클래스에 연결된다는 점 식별.
초기 학습에서 손실을 임계값 gamma 주변으로 제약하여 백도어 예제를 분리하는 로컬 그래디언트 상승(LGA) 도입.
초기 학습 중 손실이 낮은 백도어 예제의 아주 작은 부분집합을 D_b_hat로 분리(1%)
후기 학습에서 백도어를 잊게 하도록 전체 그래디언트를 상승시키면서 깨끗한 데이터 손실은 최소화하는 글로벌 그래디언트 상승(GGA) 도입.
Turning epoch T_te에서 LGA를 GGA로 전환하여 백도어 잊기를 완료하는 동시에 깨끗한 데이터 학습을 계속.
데이터셋과 모델 전반에서 작동하는 실용적인 gamma 값(0.5)과 분리 비율(1%) 제시.

실험 결과

연구 질문

RQ1사전 지식 없이 백도어 분포에서 직접 Robust 학습이 가능한가?
RQ2훈련 중에 깨끗한 데이터와 백도어 데이터가 서로 다른 어떤 학습 다이나믹스를 보이며, 이를 이용해 백도어 예제를 분리할 수 있는가?
RQ3두 단계 그래디언트 기반 스킴(분리 + 잊기)이 깨끗한 정확도를 보존하면서 백도어 효과를 제거할 수 있는가?
RQ4ABL은 다양한 데이터셋에서 여러 백도어 공격에 대해 기존 방어책과 비교해 어떤 성능을 보이는가?

주요 결과

ABL로 학습된 백도어 오염 데이터 모델은 깨끗한 데이터로 학습한 모델과 비교할 때 깨끗한 정확도에서 비슷한 성능을 보인다.
ABL은 10개의 백도어 공격에서 공격 성공률을 크게 낮추며 종종 무작위 수준에 근접하게 감소시킨다.
ABL은 CIFAR-10, GTSRB 및 ImageNet 서브세트에서 고전적 및 특징 공간 백도어 공격에 대해 강건성을 강하게 보인다.
초기 학습에서 1%의 데이터 분리와 후기 학습에서의 잊기를 결합하면 고공(poisoning) 비율이 높아도 효과적이다(Stress test에서 50–70%까지).
ABL은 평균적으로 세 가지 최첨단 방어책(Fine-pruning, MCR, NAD)보다 ASR 감소에서 우수하며 데이터셋 전반에서 높은 CA를 유지한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.