QUICK REVIEW

[Paper Review] Robust Perception through Analysis by Synthesis.

Lukas Schott, Jonas Rauber|arXiv (Cornell University)|May 23, 2018

Adversarial Robustness in Machine Learning13 references19 citations

TL;DR

This paper proposes a novel robust classification model that uses analysis by synthesis with learned class-conditional data distributions to achieve state-of-the-art adversarial robustness on MNIST. It demonstrates strong resistance to L0, L2, and L-infinity attacks, including a new decision-based attack minimizing perturbed pixels, and shows adversarial examples are perceptually plausible, moving toward the boundary between classes.

ABSTRACT

Despite much effort, deep neural networks remain highly susceptible to tiny input perturbations and even for MNIST, one of the most common toy datasets in computer vision, no neural network model exists for which adversarial perturbations are large and make semantic sense to humans. We show that even the widely recognized and by far most successful defense by Madry et al. (1) overfits on the L-infinity metric (it's highly susceptible to L2 and L0 perturbations), (2) classifies unrecognizable images with high certainty, (3) performs not much better than simple input binarization and (4) features adversarial perturbations that make little sense to humans. These results suggest that MNIST is far from being solved in terms of adversarial robustness. We present a novel robust classification model that performs analysis by synthesis using learned class-conditional data distributions. We derive bounds on the robustness and go to great length to empirically evaluate our model using maximally effective adversarial attacks by (a) applying decision-based, score-based, gradient-based and transfer-based attacks for several different Lp norms, (b) by designing a new attack that exploits the structure of our defended model and (c) by devising a novel decision-based attack that seeks to minimize the number of perturbed pixels (L0). The results suggest that our approach yields state-of-the-art robustness on MNIST against L0, L2 and L-infinity perturbations and we demonstrate that most adversarial examples are strongly perturbed towards the perceptual boundary between the original and the adversarial class.

Motivation & Objective

To address the persistent vulnerability of deep neural networks to small, imperceptible adversarial perturbations on MNIST.
To challenge the assumption that existing defenses, including Madry et al.'s L-infinity robust model, provide genuine robustness.
To develop a new defense mechanism based on generative modeling of class-conditional data distributions for improved robustness.
To empirically evaluate robustness using diverse adversarial attacks across multiple Lp norms, including a novel L0-minimizing decision-based attack.

Proposed method

The model performs analysis by synthesis by generating samples from learned class-conditional data distributions to guide classification.
It uses a variational autoencoder-like framework to model the data distribution per class, enabling reconstruction-based decision-making.
Robustness is derived analytically by bounding the likelihood of adversarial examples under the generative model.
A new decision-based attack is designed to exploit structural weaknesses in the defended model, focusing on minimizing the number of perturbed pixels (L0).
The model is evaluated using a combination of gradient-based, score-based, transfer-based, and decision-based attacks across L0, L2, and L-infinity norms.
Adversarial examples are analyzed to show they consistently shift toward the perceptual boundary between the original and adversarial class.

Experimental results

Research questions

RQ1Can a generative model-based defense achieve superior robustness across multiple Lp norms on MNIST compared to existing defenses?
RQ2Do adversarial examples generated against this model exhibit perceptual coherence and semantic meaning to humans?
RQ3How effective is a novel decision-based attack that minimizes the number of perturbed pixels (L0) in evading the proposed defense?
RQ4Does the model’s robustness stem from genuine distributional understanding or overfitting to specific attack types?
RQ5To what extent do adversarial examples generated by the model move toward the perceptual boundary between classes?

Key findings

The proposed model achieves state-of-the-art robustness on MNIST against L0, L2, and L-infinity adversarial attacks.
Madry et al.'s defense, despite its reputation, overfits on the L-infinity metric and fails under L2 and L0 attacks.
The model classifies unrecognizable inputs with low confidence, indicating better calibration than standard defenses.
Adversarial examples generated against the model are strongly perturbed toward the perceptual boundary between the original and adversarial class.
The novel decision-based attack successfully minimizes perturbed pixels (L0) and effectively evades the defense, demonstrating the model’s robustness under minimal perturbations.
Input binarization performs comparably to some defenses, suggesting that current robustness claims may be overstated.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.