QUICK REVIEW

[논문 리뷰] Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

Yansong Gao, Bao Gia Doan|arXiv (Cornell University)|2020. 07. 21.

Adversarial Robustness in Machine Learning참고 문헌 180인용 수 132

한 줄 요약

이 논문은 딥 러닝에서 backdoor 공격 표면의 체계적 분류를 제공하고 기존 공격과 대책을 조사하여 그 이점과 한계를 평가합니다. 또한 반대 면과 향후 연구 방향에 대해 논의합니다.

ABSTRACT

This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-deployment. Accordingly, attacks under each categorization are combed. The countermeasures are categorized into four general classes: blind backdoor removal, offline backdoor inspection, online backdoor inspection, and post backdoor removal. Accordingly, we review countermeasures, and compare and analyze their advantages and disadvantages. We have also reviewed the flip side of backdoor attacks, which are explored for i) protecting intellectual property of deep learning models, ii) acting as a honeypot to catch adversarial example attacks, and iii) verifying data deletion requested by the data contributor.Overall, the research on defense is far behind the attack, and there is no single defense that can prevent all types of backdoor attacks. In some cases, an attacker can intelligently bypass existing defenses with an adaptive attack. Drawing the insights from the systematic review, we also present key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and in particular, more efficient and practical countermeasures are solicited.

연구 동기 및 목표

공격자 능력과 ML 파이프라인 단계에 기반한 백도어 공격 표면의 분류 체계 제공
표면 간 백도어 공격을 집계하고 비교하며 강점과 한계를 평가
대책을 요약하고 배포 단계 및 데이터/모델 중심으로 분류
실용적 함의, 이면 응용(플립 사이드), 향후 연구 방향 논의

제안 방법

백도어 공격 개념과 CDA, ASR와 같은 메트릭 정의 및 형식화
코드 중독, 외주화, 사전 학습, 데이터 수집, 협력 학습, 배포 후 등 여섯 가지 클래스의 공격 표면 체계적 분류
각 표면 아래 대표적 공격을 질적 비교와 함께 검토 및 요약
블라인드 제거, 오프라인 점검, 온라인 점검, 백도어 제거 후 제거로 대책을 분류하고 장단점 비교
IP 보호, 허니팟, 데이터 삭제 검증 등 포괄적 시사점 논의 및 향후 연구 방향 제시

실험 결과

연구 질문

RQ1DL 파이프라인에서 백도어 공격이 실행될 수 있는 표면을 가장 잘 포착하는 분류 체계는 무엇인가
RQ2각 표면에서 주요 백도어 공격 기술은 무엇이며, 능력과 성능에서 어떻게 비교되는가
RQ3어떤 방어 전략이 존재하며, 어떻게 분류되며, 적응적 공격에 대한 한계는 무엇인가
RQ4백도어 연구의 더 넓은 함의와 잠재적 긍정적 응용(플립 사이드)은 무엇인가
RQ5실증적 평가와 방어 개발을 위한 주요 개방 도전과제와 향후 방향은 무엇인가

주요 결과

백도어 공격은 ML 파이프라인의 단계와 공격자 능력에 대응하는 여섯 가지 표면으로 구성될 수 있다
공격은 깨끗한 데이터에서 정상적인 성능을 유지하면서 트리거가 작동할 때 높은 공격 성공률을 달성한다
모든 백도어 변형을 방지하는 단일 방어는 없으며, 적응적 공격자는 일부 방어를 우회할 수 있다
방어 연구는 공격 기술에 비해 뒤처져 있으며 실용적이고 효율적인 대책의 필요성을 강조한다
리뷰는 IP 보호, 허니팟 역할, 데이터 삭제 검증 등 백도어 연구의 더 넓은 활용 가능성을 식별한다
저자들은 물리적 트리거를 통한 실증적 보안 평가 및 더 효과적인 방어를 포함한 향후 연구 방향을 제안한다

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.