QUICK REVIEW

[Paper Review] Learning to Defend by Learning to Attack

Haoming Jiang, Zhehui Chen|arXiv (Cornell University)|Nov 3, 2018

Adversarial Robustness in Machine Learning47 references30 citations

TL;DR

This paper proposes a novel learning-to-learn (L2L) framework that trains a neural network optimizer to generate adversarial examples, which in turn improves robustness during adversarial training. By end-to-end learning of the attack process via a differentiable optimizer network, the method achieves state-of-the-art accuracy and efficiency on CIFAR-10 and CIFAR-100, outperforming existing adversarial training baselines.

ABSTRACT

Adversarial training provides a principled approach for training robust neural networks. From an optimization perspective, adversarial training is essentially solving a bilevel optimization problem. The leader problem is trying to learn a robust classifier, while the follower problem is trying to generate adversarial samples. Unfortunately, such a bilevel problem is difficult to solve due to its highly complicated structure. This work proposes a new adversarial training method based on a generic learning-to-learn (L2L) framework. Specifically, instead of applying existing hand-designed algorithms for the inner problem, we learn an optimizer, which is parametrized as a convolutional neural network. At the same time, a robust classifier is learned to defense the adversarial attack generated by the learned optimizer. Experiments over CIFAR-10 and CIFAR-100 datasets demonstrate that L2L outperforms existing adversarial training methods in both classification accuracy and computational efficiency. Moreover, our L2L framework can be extended to generative adversarial imitation learning and stabilize the training.

Motivation & Objective

To address the challenge of solving the bilevel optimization problem in adversarial training, which is computationally complex and difficult to optimize.
To improve adversarial robustness by learning an end-to-end optimizer that generates strong, transferable adversarial perturbations.
To enhance training stability and efficiency compared to hand-designed attack methods like FGSM or PGD.
To unify adversarial training and generative adversarial imitation learning (GAIL) under a single L2L framework for improved stability.

Proposed method

Proposes a differentiable, end-to-end L2L framework where the inner problem (adversarial attack generation) is solved by a neural network optimizer, parameterized as a convolutional network.
The attacker network takes both input images and their gradients as input, enabling it to learn effective perturbation patterns through gradient-based optimization.
The robust classifier is trained jointly with the attacker network in a bilevel optimization setup, where the leader minimizes test loss under adversarial distributions generated by the follower.
Employs techniques from GAN training, such as the two-time-scale update rule, to stabilize the training of the end-to-end L2L system.
Extends the framework to GAIL by using the same L2L attacker to generate adversarial demonstrations, stabilizing policy training in imitation learning.
Uses skip connections and architectural design to preserve gradient information and prevent training instability in the attacker network.

Experimental results

Research questions

RQ1Can a learned optimizer outperform hand-designed adversarial attack methods like FGSM and PGD in generating robust adversarial examples?
RQ2Does end-to-end training of an L2L-based attacker improve the robustness and accuracy of neural networks on standard benchmarks?
RQ3Can the L2L framework stabilize training in adversarial imitation learning, where standard GAIL suffers from mode collapse and performance drops?
RQ4How does incorporating gradient information into the attacker network affect the quality and generalization of generated adversarial examples?

Key findings

The proposed L2L framework achieves state-of-the-art test accuracy on CIFAR-10 and CIFAR-100, outperforming existing adversarial training methods under both FGSM and PGD attacks.
The method demonstrates superior computational efficiency, reducing the need for iterative attack generation per sample by learning a generalizable attack policy.
In GAIL experiments, the L2L-based approach stabilizes training and avoids the sudden performance drops seen in standard GAIL, which overfits to expert trajectories.
The inclusion of gradient information in the attacker input significantly improves training stability and robustness, as shown by the failure of naive and slim attacker variants without this component.
The L2L attacker learns shared structural patterns across samples, enabling it to generate strong, transferable adversarial examples that generalize well across different attack types.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.