[Paper Review] Fast is better than free: Revisiting adversarial training
The paper shows that FGSM adversarial training with random initialization can match PGD-based robustness at far lower cost, and that fast training techniques dramatically accelerate robust model learning, though a failure mode called catastrophic overfitting can occur.
Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Specifically, we show that adversarial training with the fast gradient sign method (FGSM), when combined with random initialization, is as effective as PGD-based training but has significantly lower cost. Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $ε=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $ε=2/255$ in 12 hours, in comparison to past work based on "free" adversarial training which took 10 and 50 hours to reach the same respective thresholds. Finally, we identify a failure mode referred to as "catastrophic overfitting" which may have caused previous attempts to use FGSM adversarial training to fail. All code for reproducing the experiments in this paper as well as pretrained model weights are at https://github.com/locuslab/fast_adversarial.
Motivation & Objective
- Motivate a cheaper, faster route to empirically robust deep networks using adversarial training.
- Evaluate whether weak adversaries (FGSM) can achieve robustness comparable to strong PGD adversaries.
- Integrate DAWNBench-inspired techniques to accelerate adversarial training (cyclic learning rates, mixed precision).
- Identify failure modes that hinder FGSM-based robustness and propose remedies.
- Demonstrate practical robustness and training speed on CIFAR-10 and ImageNet benchmarks.
Proposed method
- Formulate adversarial training as a robust optimization problem under l_infty perturbations (epsilon).
- Use FGSM with random initialization to generate adversarial examples for training.
- Incorporate random restarts and FGSM step sizing adjustments (e.g., alpha = 1.25 * epsilon) to improve robustness.
- Apply DAWNBench-inspired training accelerations: cyclic learning rates and mixed-precision arithmetic.
- Evaluate robustness against strong PGD attacks and verify on MNIST/CIFAR-10/ImageNet with varying epsilons.
- Identify and analyze catastrophic overfitting as a failure mode and propose early-stopping based remedies.
Experimental results
Research questions
- RQ1Can FGSM adversarial training with random initialization achieve empirical robustness comparable to PGD-based adversarial training?
- RQ2How do training accelerations from cyclic learning rates and mixed-precision affect adversarial training efficiency and robustness?
- RQ3What is the impact of initialization and step-size choices on FGSM-based robustness, and what failure mode (“catastrophic overfitting”) can occur?
- RQ4How do fast FGSM-based methods perform on CIFAR-10 and ImageNet against strong PGD evaluations?
- RQ5What are practical guidelines for achieving robust models with minimal training time?
Key findings
- FGSM adversarial training with random initialization can achieve robustness comparable to PGD-based training on CIFAR-10 at a fraction of the cost.
- Using cyclic learning rates and mixed-precision training accelerates convergence, enabling CIFAR-10 robust models in minutes and ImageNet robust models in hours.
- For CIFAR-10 at epsilon = 8/255, robust accuracy against PGD is around the same as prior PGD-based work, but with significantly reduced training time.
- ImageNet robust models at epsilon = 2/255 achieve similar robustness to prior methods in about 12 hours using FGSM with fast techniques.
- A failure mode called catastrophic overfitting can occur when FGSM perturbations are pushed to the boundary or zero initialization is used; early stopping based on PGD accuracy can recover robustness.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.