QUICK REVIEW

[논문 리뷰] Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

Pouya Samangouei, Maya Kabkab|arXiv (Cornell University)|2018. 05. 17.

Adversarial Robustness in Machine Learning참고 문헌 20인용 수 284

한 줄 요약

Defense-GAN은 분류 전에 입력을 생성기의 범위를 투사하기 위해 Wasserstein GAN을 사용하여 화이트박스 및 블랙박스 적대적 공격으로부터 분류기를 수정 없이 방어합니다. MNIST 및 Fashion-MNIST 데이터셋에서 여러 베이스라인보다 성능이 우수합니다.

ABSTRACT

In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against such attacks. Defense-GAN is trained to model the distribution of unperturbed images. At inference time, it finds a close output to a given image which does not contain the adversarial changes. This output is then fed to the classifier. Our proposed method can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. We empirically show that Defense-GAN is consistently effective against different attack methods and improves on existing defense strategies. Our code has been made publicly available at https://github.com/kabkabm/defensegan

연구 동기 및 목표

적대적 왜곡으로 인해 심층 네트워크를 오도할 수 있는 robust한 분류를 목표로 한다.
정상 데이터의 분포를 모델링하기 위해 생성 모델을 활용한다.
분류기나 공격 특정 가정의 변경 없이 defense를 제공한다.
벤치마크 데이터셋 전반에 걸쳐 화이트박스 및 블랙박스 공격에 대한 효과를 입증한다.

제안 방법

합법적 학습 데이터에서 Wasserstein GAN(WGAN)을 학습시켜 데이터 분포를 모델링한다.
추론 시, 다수의 임의 재시작(L 회차, R 재시작)으로 ||G(z) - x||^2를 최소화하는 z를 경사 하강법으로 구한다.
원래 입력 대신 재구성된 영상 G(z*)를 분류기에 입력으로 사용한다.
분류기 아키텍처나 학습 절차를 수정하지 않으며, 방어는 전처리 단계로 작동한다.
원한다면 Defense-GAN-Rec 대 Defense-GAN-Orig로 재구성된 영상에 대해 분류기를 학습시킨다.
FGSM, RAND+FGSM, CW 공격하에서 화이트박스 및 블랙박스 설정의 adversarial training 및 MagNet과 비교한다.

실험 결과

연구 질문

RQ1GAN 기반 투사가 분류기를 변경 없이 화이트박스 및 블랙박스 공격으로부터 방어할 수 있는가?
RQ2GAN 범위로 투사하는 입력이 악의적 섭동을 감소시키면서 깨끗한 데이터의 정확도를 보존하는가?
RQ3Defense-GAN은 기존 방어들(adversarial training, MagNet)과 일반적인 공격 전략에서 어떻게 비교되는가?
RQ4초매개변수 L( GD 단계)와 R(임의 재시작)이 방어 효과 및 탐지 능력에 어떤 영향을 미치는가?

주요 결과

Defense-GAN은 MNIST 및 Fashion-MNIST에서 일반적인 공격에 대한 강건성을 일관되게 향상시키는 것으로 나타났다.
방어는 분류기를 변경할 필요가 없고 어떤 모델과도 함께 사용할 수 있다.
랜덤 초기화된 잠재 코드에 대한 공격자가 알고 있더라도 CW를 포함한 화이트박스 공격 하에서 공격 강건성이 유지된다.
재구성 오차(MSE) 임계치를 통한 공격 탐지 가능성을 열어 준다.
GD 반복 횟수와 재시작 수를 늘리면 일반적으로 탐지 성능 및 방어 효과가 향상되지만 추론 시간에 대한 트레이드오프가 있다.
Defense-GAN 변형(Defense-GAN-Rec vs Defense-GAN-Orig)은 재구성 영상 대 원본 영상으로 학습했을 때의 강건성에 큰 차이가 없음을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.