[Paper Review] Biologically inspired protection of deep networks from adversarial attacks
The paper presents a biologically inspired training scheme that pushes networks into a nonlinear, saturated regime to intrinsically resist gradient-based adversarial attacks, achieving state-of-the-art robustness on MNIST without adversarial training.
Inspired by biophysical principles underlying nonlinear dendritic computation in neural circuits, we develop a scheme to train deep neural networks to make them robust to adversarial attacks. Our scheme generates highly nonlinear, saturated neural networks that achieve state of the art performance on gradient based adversarial examples on MNIST, despite never being exposed to adversarially chosen examples during training. Moreover, these networks exhibit unprecedented robustness to targeted, iterative schemes for generating adversarial examples, including second-order methods. We further identify principles governing how these networks achieve their robustness, drawing on methods from information geometry. We find these networks progressively create highly flat and compressed internal representations that are sensitive to very few input dimensions, while still solving the task. Moreover, they employ highly kurtotic weight distributions, also found in the brain, and we demonstrate how such kurtosis can protect even linear classifiers from adversarial attack.
Motivation & Objective
- Motivate robust adversarial defenses inspired by nonlinear dendritic computation.
- Develop a practical training scheme that drives networks into saturated regimes.
- Analyze how saturation affects internal representations and geometry to confer robustness.
- Identify weight distribution properties (high kurtosis) linked to robustness, including linear mechanisms in simple classifiers.
Proposed method
- Design a saturating penalty that encourages activations to operate in the saturated regime of nonlinearities.
- Apply an annealed penalty during training across all layers, including readout, integrated with standard optimization (Adam).
- Use a cross-entropy based objective combined with the saturating regularizer to discourage linear regimes.
- Evaluate robustness against gradient-based adversaries (fast gradient sign method) and iterative second-order methods.
- Compare vanilla, adversarially trained, and saturated networks across sigmoid MLP, ReLU MLP, and CNN architectures on MNIST.
Experimental results
Research questions
- RQ1Can a biologically inspired saturated regime improve intrinsic robustness of deep networks to adversarial perturbations without adversarial training?
- RQ2What internal representations and geometric properties emerge in saturated networks that underlie robustness?
- RQ3Do high weight kurtosis distributions, similar to brain networks, contribute to adversarial protection?
- RQ4How do saturated networks fare against iterative and second-order adversaries compared to standard defenses?
- RQ5Is the robustness observed in saturated networks transferable across different architectures (MLP variants and CNNs)?
Key findings
- Saturated networks achieve 2-7% error on gradient-based MNIST adversarial examples with little loss on clean test accuracy.
- Saturating networks outperform adversarially trained counterparts on adversarial examples in the MNIST setting.
- Weights in saturated networks exhibit higher excess kurtosis, a brain-like property linked to robustness.
- Internal representations become highly clustered by class and increasingly separated across layers, with flat input-output mappings.
- Information-geometric analysis shows saturated networks have flat, low-dimensional input-output functions and singular-value patterns indicating constrained sensitivity directions.
- Weight kurtosis can independently confer robustness even in linear classifiers.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.