QUICK REVIEW

[논문 리뷰] Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection

Di Tang, Xiaofeng Wang|arXiv (Cornell University)|2019. 08. 02.

Adversarial Robustness in Machine Learning참고 문헌 41인용 수 44

한 줄 요약

TaCT와 SCAn을 도입하여 소스별 백도어를 포함한 DNN의 백도어 오염을 전역 표현 통계 분석으로 탐지하고, EM 분해와 가능도 비율 검정을 사용합니다.

ABSTRACT

A security threat to deep neural networks (DNN) is backdoor contamination, in which an adversary poisons the training data of a target model to inject a Trojan so that images carrying a specific trigger will always be classified into a specific label. Prior research on this problem assumes the dominance of the trigger in an image's representation, which causes any image with the trigger to be recognized as a member in the target class. Such a trigger also exhibits unique features in the representation space and can therefore be easily separated from legitimate images. Our research, however, shows that simple target contamination can cause the representation of an attack image to be less distinguishable from that of legitimate ones, thereby evading existing defenses against the backdoor infection. In our research, we show that such a contamination attack actually subtly changes the representation distribution for the target class, which can be captured by a statistic analysis. More specifically, we leverage an EM algorithm to decompose an image into its identity part (e.g., person, traffic sign) and variation part within a class (e.g., lighting, poses). Then we analyze the distribution in each class, identifying those more likely to be characterized by a mixture model resulted from adding attack samples to the legitimate image pool. Our research shows that this new technique effectively detects data contamination attacks, including the new one we propose, and is also robust against the evasion attempts made by a knowledgeable adversary.

연구 동기 및 목표

기존 백도어 방어가 TaCT에 대해 실패하는 이유를 설명한다.
모든 클래스에 걸친 전역 표현 분포를 활용하는 강 robust한 탐지기 SCAn을 개발한다.
TaCT가 공격 및 정상 표현을 구별하기 어려운 소스별 백도어를 생성할 수 있음을 입증한다.
다양한 데이터셋에 대해 TaCT가 기존 방어를 우회하고 SCAn이 오염을 탐지하는 것을 실험적으로 평가한다.

제안 방법

입력을 EM으로 두 가지 구성요소(정체성 mu_t와 변동 epsilon)로 표현한다.
모든 클래스에 걸친 표현을 분해하여 클래스별 정체성 벡터와 보편적 변동 분포를 추정한다.
가능도 비율 검정을 적용하여 표현이 오염된 혼합물을 반영하는 클래스를 탐지한다.
트리거와 덮개 이미지를 삽입하여 소스별 백도어를 제한된 오염으로 생성하는 TaCT를 시연한다.
TaCT에 대해 방어 방법(Neural Cleanse, STRIP, SentiNet, Activation Clustering)의 성능을 평가하고 이들의 실패를 보인다.
SCAn을 제안하여 클래스 간 분포를 활용한 글로벌 정보 기반 탐지기로서의 역할을 한다.

실험 결과

연구 질문

RQ1TaCT가 기존의 트리거 우세 기반 방어를 회피하는 소스별 백도어를 가능하게 할 수 있는가?
RQ2전 클래스에 걸친 글로벌 표현 분석이 클래스 내부 방법으로는 감지할 수 없는 오염을 밝힐 수 있는가?
RQ3Representations를 EM 기반으로 정체성과 변동으로 분해하는 것이 백도어 탐지에 효과적인가?
RQ4SCAn은 다양한 백도어 구성과 블랙박스 공격에 대해 얼마나 강건한가?
RQ5SCAn의 다양한 데이터 세트에 대한 기존 방어 대비 상대적 효과는 어느 정도인가?

주요 결과

TaCT는 덮개 이미지를 사용해 출처별 백도어를 고도화된 타깃 오분류율과 비출처 클래스에 대한 낮은 교란으로 가능하게 한다.
TaCT 하의 공격 이미지 표현은 2차원 PCA 투영에서 정상 목표 클래스 이미지와 구분할 수 없게 된다.
기존 방어(Neural Cleanse, STRIP, SentiNet, Activation Clustering)는 GTSRB 및 CIFAR-10에서 TaCT 감지를 신뢰성 있게 수행하지 못한다.
TaCT는 교란 수준이 modest(예: 커버 이미지 포함 2.1% 교란)인 경우에도 높은 목표 오분류율을 달성하면서 전체 정확도를 기초선에 가깝게 유지한다.
SCAn은 이차 구성 분해와 보편적 변동을 활용하여 클래스 간 분포의 불일치를 분석함으로써 오염을 탐지한다.
SCAn은 TaCT에 대한 효과적 입증과 다른 블랙박스 공격에 대한 강건성을 전 클래스의 글로벌 정보를 활용해 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.