QUICK REVIEW

[논문 리뷰] APE-GAN: Adversarial Perturbation Elimination with GAN

Shiwei Shen, Guoqing Jin|arXiv (Cornell University)|2017. 07. 18.

Adversarial Robustness in Machine Learning참고 문헌 31인용 수 80

한 줄 요약

APE-GAN은 Generative Adversarial Network를 사용하여 입력에서 적대적 섭동을 제거한 후 이를 분류기에 공급하여 MNIST, CIFAR-10, 및 ImageNet에 대한 다중 공격에 대한 강인성을 향상시킵니다.

ABSTRACT

Although neural networks could achieve state-of-the-art performance while recongnizing images, they often suffer a tremendous defeat from adversarial examples--inputs generated by utilizing imperceptible but intentional perturbation to clean samples from the datasets. How to defense against adversarial examples is an important problem which is well worth researching. So far, very few methods have provided a significant defense to adversarial examples. In this paper, a novel idea is proposed and an effective framework based Generative Adversarial Nets named APE-GAN is implemented to defense against the adversarial examples. The experimental results on three benchmark datasets including MNIST, CIFAR10 and ImageNet indicate that APE-GAN is effective to resist adversarial examples generated from five attacks.

연구 동기 및 목표

심층 신경망에서 지각되지 않는 교란에 대한 방어의 필요성을 제시한다.
GAN 기반 프레임워크(APE-GAN)를 제안하여 적대적 입력으로부터 깨끗한 이미지에 유사한 재구성을 수행한다.
MNIST, CIFAR-10, 및 ImageNet에서 여러 공격 방법에 대해 효과를 입증한다.
목표 모델 아키텍처의 지식이 필요 없이 작동하는 방어를 보여준다.

제안 방법

적대적 입력 X_adv를 매핑하여 깨끗한 X를 닮은 재구성 X_hat를 생성하는 DCGAN 스타일 설정에서 제너레이터 G와 판별기 D를 도입한다.
픽셀 단위 콘텐츠 손실과 적대적 손실을 결합한 합성 손실 l_ape로 G를 학습하여 출력이 깨끗한 이미지 매니폴드에 놓이도록 한다.
G(X_adv)와 실제 깨끗한 이미지 간 구분을 학습시키며 D를 두고 지겹 minimax objective를 구성한다.
적대적 교란 제거를 사용하여 교란이 제거되도록 학습하되 이미지 콘텐츠는 보존하는 방법을 사용한다.
세 가지 데이터셋(MNIST, CIFAR-10, ImageNet)에서 여섯 가지 적대적 공격(L-BFGS, FGSM, DeepFool, JSMA, CW L0/L2/L∞)에 대해 평가한다.
MNIST, CIFAR-10, ImageNet 변형(APE-GAN m, c, i) 및 학습 설정(학습률, 최적화 알고리즘, 배치 크기)에 대한 아키텍처 세부 정보를 제공한다.

실험 결과

연구 질문

RQ1Can a GAN-based perturbation eliminator reconstruct clean-like images from adversarial inputs without access to the target model's parameters?
RQ2How effective is APE-GAN across multiple datasets and adversarial attack methods in restoring correct classifications?
RQ3Does preprocessing with APE-GAN compromise performance on benign (non-adversarial) inputs?
RQ4Can APE-GAN be integrated with other defenses (e.g., adversarial training) for enhanced robustness?

주요 결과

공격	MNIST 타깃 모델	MNIST APE-GAN m	CIFAR-10 타깃 모델	CIFAR-10 APE-GAN c	ImageNet Top-1 타깃 모델	ImageNet Top-1 APE-GAN i
L-BFGS	93.4	2.2	92.7	19.9	93.3	42.9
FGSM	96.3	2.8	77.8	26.4	72.9	40.1
DeepFool	97.1	2.2	98.3	19.2	98.4	45.9
JSMA	97.8	38.6	94.1	38.3	98.7	45.0
CW-L0	100.0	27.0	100.0	46.9	100.0	29.4
CW-L2	100.0	1.5	100.0	30.5	99.7	26.1
CW-L∞	100.0	1.2	100.0	32.2	100.0	27.0

On MNIST, CIFAR-10, and ImageNet, APE-GAN substantially lowers adversarial error rates across attacks. For example, L-BFGS adversarial inputs drop from 93.4% to 2.2% on MNIST and from 92.7% to 19.9% on CIFAR-10, with 93.3% to 42.9% on ImageNet Top-1 after reconstruction.
FGSM adversarial inputs drop from 96.3% to 2.8% (MNIST) and from 72.9% to 40.1% (ImageNet) after applying APE-GAN.
DeepFool adversarial inputs drop from 97.1% to 2.2% (MNIST) and from 98.4% to 45.9% (ImageNet) after reconstruction.
JSMA adversarial inputs drop from 97.8% to 38.6% on MNIST and 38.3% on CIFAR-10 after APE-GAN.
CW attacks remain highly effective against the target models, but reconstruction reduces their impact (e.g., CW-L0 from 100.0% to 27.0% on MNIST; CW-L2 from 100.0% to 1.5% on MNIST).
Benign inputs show no marked deterioration of clean accuracy; APE-GAN does not significantly elevate error rates on clean or randomly noisy images.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.