QUICK REVIEW

[Paper Review] APE-GAN: Adversarial Perturbation Elimination with GAN

Shiwei Shen, Guoqing Jin|arXiv (Cornell University)|Jul 18, 2017

Adversarial Robustness in Machine Learning31 references80 citations

TL;DR

APE-GAN uses a Generative Adversarial Network to remove adversarial perturbations from inputs before feeding them to classifiers, improving robustness against multiple attacks on MNIST, CIFAR-10, and ImageNet.

ABSTRACT

Although neural networks could achieve state-of-the-art performance while recongnizing images, they often suffer a tremendous defeat from adversarial examples--inputs generated by utilizing imperceptible but intentional perturbation to clean samples from the datasets. How to defense against adversarial examples is an important problem which is well worth researching. So far, very few methods have provided a significant defense to adversarial examples. In this paper, a novel idea is proposed and an effective framework based Generative Adversarial Nets named APE-GAN is implemented to defense against the adversarial examples. The experimental results on three benchmark datasets including MNIST, CIFAR10 and ImageNet indicate that APE-GAN is effective to resist adversarial examples generated from five attacks.

Motivation & Objective

Motivate the defense against imperceptible adversarial perturbations in deep neural networks.
Propose a GAN-based framework (APE-GAN) to reconstruct clean-like images from adversarial inputs.
Demonstrate effectiveness across MNIST, CIFAR-10, and ImageNet against multiple attack methods.
Show that the defense operates without requiring knowledge of the target model architecture.

Proposed method

Introduce a generator G and discriminator D in a DCGAN-style setup to map adversarial inputs X_adv to reconstructed X_hat that resemble clean X.
Train G with a composite loss l_ape combining pixel-wise content loss and an adversarial loss to ensure outputs lie on the clean image manifold.
Train D to distinguish reconstructed images G(X_adv) from real clean images, formulating a minimax objective between G and D.
Use adversarial perturbation elimination where the perturbation is learned to be removed while preserving image content.
Evaluate against six adversarial attacks (L-BFGS, FGSM, DeepFool, JSMA, CW L0/L2/L∞) on three datasets (MNIST, CIFAR-10, ImageNet).
Provide architecture details for MNIST, CIFAR-10, and ImageNet variants (APE-GAN m, c, i) and training settings (learning rate, optimizer, batch sizes).

Experimental results

Research questions

RQ1Can a GAN-based perturbation eliminator reconstruct clean-like images from adversarial inputs without access to the target model's parameters?
RQ2How effective is APE-GAN across multiple datasets and adversarial attack methods in restoring correct classifications?
RQ3Does preprocessing with APE-GAN compromise performance on benign (non-adversarial) inputs?
RQ4Can APE-GAN be integrated with other defenses (e.g., adversarial training) for enhanced robustness?

Key findings

Attack	MNIST Target Model	MNIST APE-GAN m	CIFAR-10 Target Model	CIFAR-10 APE-GAN c	ImageNet Top-1 Target Model	ImageNet Top-1 APE-GAN i
L-BFGS	93.4	2.2	92.7	19.9	93.3	42.9
FGSM	96.3	2.8	77.8	26.4	72.9	40.1
DeepFool	97.1	2.2	98.3	19.2	98.4	45.9
JSMA	97.8	38.6	94.1	38.3	98.7	45.0
CW-L0	100.0	27.0	100.0	46.9	100.0	29.4
CW-L2	100.0	1.5	100.0	30.5	99.7	26.1
CW-L∞	100.0	1.2	100.0	32.2	100.0	27.0

On MNIST, CIFAR-10, and ImageNet, APE-GAN substantially lowers adversarial error rates across attacks. For example, L-BFGS adversarial inputs drop from 93.4% to 2.2% on MNIST and from 92.7% to 19.9% on CIFAR-10, with 93.3% to 42.9% on ImageNet Top-1 after reconstruction.
FGSM adversarial inputs drop from 96.3% to 2.8% (MNIST) and from 72.9% to 40.1% (ImageNet) after applying APE-GAN.
DeepFool adversarial inputs drop from 97.1% to 2.2% (MNIST) and from 98.4% to 45.9% (ImageNet) after reconstruction.
JSMA adversarial inputs drop from 97.8% to 38.6% on MNIST and 38.3% on CIFAR-10 after APE-GAN.
CW attacks remain highly effective against the target models, but reconstruction reduces their impact (e.g., CW-L0 from 100.0% to 27.0% on MNIST; CW-L2 from 100.0% to 1.5% on MNIST).
Benign inputs show no marked deterioration of clean accuracy; APE-GAN does not significantly elevate error rates on clean or randomly noisy images.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.