[Paper Review] APE-GAN: Adversarial Perturbation Elimination with GAN
APE-GAN uses a Generative Adversarial Network to remove adversarial perturbations from inputs before feeding them to classifiers, improving robustness against multiple attacks on MNIST, CIFAR-10, and ImageNet.
Although neural networks could achieve state-of-the-art performance while recongnizing images, they often suffer a tremendous defeat from adversarial examples--inputs generated by utilizing imperceptible but intentional perturbation to clean samples from the datasets. How to defense against adversarial examples is an important problem which is well worth researching. So far, very few methods have provided a significant defense to adversarial examples. In this paper, a novel idea is proposed and an effective framework based Generative Adversarial Nets named APE-GAN is implemented to defense against the adversarial examples. The experimental results on three benchmark datasets including MNIST, CIFAR10 and ImageNet indicate that APE-GAN is effective to resist adversarial examples generated from five attacks.
Motivation & Objective
- Motivate the defense against imperceptible adversarial perturbations in deep neural networks.
- Propose a GAN-based framework (APE-GAN) to reconstruct clean-like images from adversarial inputs.
- Demonstrate effectiveness across MNIST, CIFAR-10, and ImageNet against multiple attack methods.
- Show that the defense operates without requiring knowledge of the target model architecture.
Proposed method
- Introduce a generator G and discriminator D in a DCGAN-style setup to map adversarial inputs X_adv to reconstructed X_hat that resemble clean X.
- Train G with a composite loss l_ape combining pixel-wise content loss and an adversarial loss to ensure outputs lie on the clean image manifold.
- Train D to distinguish reconstructed images G(X_adv) from real clean images, formulating a minimax objective between G and D.
- Use adversarial perturbation elimination where the perturbation is learned to be removed while preserving image content.
- Evaluate against six adversarial attacks (L-BFGS, FGSM, DeepFool, JSMA, CW L0/L2/L∞) on three datasets (MNIST, CIFAR-10, ImageNet).
- Provide architecture details for MNIST, CIFAR-10, and ImageNet variants (APE-GAN m, c, i) and training settings (learning rate, optimizer, batch sizes).
Experimental results
Research questions
- RQ1Can a GAN-based perturbation eliminator reconstruct clean-like images from adversarial inputs without access to the target model's parameters?
- RQ2How effective is APE-GAN across multiple datasets and adversarial attack methods in restoring correct classifications?
- RQ3Does preprocessing with APE-GAN compromise performance on benign (non-adversarial) inputs?
- RQ4Can APE-GAN be integrated with other defenses (e.g., adversarial training) for enhanced robustness?
Key findings
| Attack | MNIST Target Model | MNIST APE-GAN m | CIFAR-10 Target Model | CIFAR-10 APE-GAN c | ImageNet Top-1 Target Model | ImageNet Top-1 APE-GAN i |
|---|---|---|---|---|---|---|
| L-BFGS | 93.4 | 2.2 | 92.7 | 19.9 | 93.3 | 42.9 |
| FGSM | 96.3 | 2.8 | 77.8 | 26.4 | 72.9 | 40.1 |
| DeepFool | 97.1 | 2.2 | 98.3 | 19.2 | 98.4 | 45.9 |
| JSMA | 97.8 | 38.6 | 94.1 | 38.3 | 98.7 | 45.0 |
| CW-L0 | 100.0 | 27.0 | 100.0 | 46.9 | 100.0 | 29.4 |
| CW-L2 | 100.0 | 1.5 | 100.0 | 30.5 | 99.7 | 26.1 |
| CW-L∞ | 100.0 | 1.2 | 100.0 | 32.2 | 100.0 | 27.0 |
- On MNIST, CIFAR-10, and ImageNet, APE-GAN substantially lowers adversarial error rates across attacks. For example, L-BFGS adversarial inputs drop from 93.4% to 2.2% on MNIST and from 92.7% to 19.9% on CIFAR-10, with 93.3% to 42.9% on ImageNet Top-1 after reconstruction.
- FGSM adversarial inputs drop from 96.3% to 2.8% (MNIST) and from 72.9% to 40.1% (ImageNet) after applying APE-GAN.
- DeepFool adversarial inputs drop from 97.1% to 2.2% (MNIST) and from 98.4% to 45.9% (ImageNet) after reconstruction.
- JSMA adversarial inputs drop from 97.8% to 38.6% on MNIST and 38.3% on CIFAR-10 after APE-GAN.
- CW attacks remain highly effective against the target models, but reconstruction reduces their impact (e.g., CW-L0 from 100.0% to 27.0% on MNIST; CW-L2 from 100.0% to 1.5% on MNIST).
- Benign inputs show no marked deterioration of clean accuracy; APE-GAN does not significantly elevate error rates on clean or randomly noisy images.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.