[Paper Review] Perceptual Adversarial Robustness: Defense Against Unseen Threat Models
This paper proposes Perceptual Adversarial Training (PAT), a defense that trains models to be robust against all imperceptible adversarial attacks by using a neural perceptual distance (LPIPS) as a surrogate for human perception. PAT achieves state-of-the-art robustness—more than doubling accuracy—against five diverse unseen attacks (L₂, L∞, spatial, recoloring, JPEG) on CIFAR-10 and ImageNet-100 without training on any of them, demonstrating strong generalization to unforeseen threat models.
A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception, used in the very definition of adversarial attacks that are imperceptible to human eyes. Most current attacks and defenses try to avoid this issue by considering restrictive adversarial threat models such as those bounded by $L_2$ or $L_\infty$ distance, spatial perturbations, etc. However, models that are robust against any of these restrictive threat models are still fragile against other threat models. To resolve this issue, we propose adversarial training against the set of all imperceptible adversarial examples, approximated using deep neural networks. We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images. Through an extensive perceptual study, we show that the neural perceptual distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Under the NPTM, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against five diverse adversarial attacks. We find that PAT achieves state-of-the-art robustness against the union of these five attacks, more than doubling the accuracy over the next best model, without training against any of them. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial training defense with this property.
Motivation & Objective
- Address the lack of a precise mathematical characterization of human perception in adversarial robustness research.
- Overcome the limitations of restrictive threat models (e.g., L₂, L∞) that fail to generalize to unseen attack types.
- Develop a defense that generalizes robustness across diverse, unforeseen perturbation types by modeling the perceptual threat model.
- Validate that neural perceptual distance (LPIPS) correlates well with human perception to enable scalable adversarial training.
- Demonstrate that training against a broad perceptual threat model yields strong generalization to both targeted and non-targeted attacks, including common corruptions.
Proposed method
- Define the perceptual adversarial threat model as all perturbations that are imperceptible to humans, formalized using a true perceptual distance d*.
- Approximate the intractable true perceptual distance d* using LPIPS, a learned perceptual similarity metric based on deep network activations.
- Propose the Neural Perceptual Threat Model (NPTM), which includes all adversarial examples within a bounded LPIPS distance from a natural image.
- Develop novel perceptual adversarial attacks using projected gradient descent (PGD) with LPIPS-based constraints to generate imperceptible adversarial examples.
- Train classifiers via adversarial training using these perceptual attacks, resulting in Perceptual Adversarial Training (PAT).
- Use self-supervised and pre-trained models (e.g., AlexNet) to compute LPIPS for both attack and defense, enabling transferable robustness.
Experimental results
Research questions
- RQ1Can a defense trained against a broad perceptual threat model generalize to unseen adversarial attack types not seen during training?
- RQ2How well does the LPIPS distance correlate with human perception of image perturbations compared to traditional Lp norms?
- RQ3Does adversarial training under the neural perceptual threat model (NPTM) yield better robustness than standard adversarial training under L₂ or L∞ constraints?
- RQ4Can PAT generalize to natural corruptions (e.g., blur, noise, weather) that are not explicitly targeted during training?
- RQ5Is there a trade-off between clean accuracy and robustness when using PAT compared to standard adversarial training methods?
Key findings
- PAT achieves state-of-the-art robustness on CIFAR-10, outperforming the next best model by more than doubling the accuracy against the union of five diverse attacks (L₂, L∞, spatial, recoloring, JPEG), with no training on any of these attack types.
- On CIFAR-10-C, PAT achieves a relative mean corruption error (mCE) of 0.50 (PAT-self) and 0.49 (PAT-AlexNet), significantly lower than L₂ adversarial training (0.54) and L∞ adversarial training (0.57).
- On ImageNet-100-C, PAT achieves a relative mCE of 0.37 (PAT-self) and 0.39 (PAT-AlexNet), outperforming L₂ (0.41) and L∞ (0.42) adversarial training across all corruption types except 'noise', where L₂ performs best due to its symmetric distribution.
- The perceptual distance measured by LPIPS correlates strongly with human perception, as validated through a perceptual study, supporting its use as a surrogate for the true perceptual distance.
- PAT generalizes robustness to natural corruptions, indicating that robustness against worst-case perceptual perturbations also confers robustness to random, real-world distortions.
- PAT maintains high clean accuracy (e.g., 93.4% on CIFAR-10) while achieving exceptional robustness, demonstrating a favorable trade-off between accuracy and robustness compared to prior methods.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.