[Paper Review] Attacking the Madry Defense Model with $L_1$-based Adversarial Examples
The paper shows that L1-based elastic-net adversarial examples (EAD) transfer to the Madry Defense Model and can outperform L2/L∞-based PGD/I-FGM in targeted transfer, while often causing less visual distortion than comparable L∞-based attacks.
The Madry Lab recently hosted a competition designed to test the robustness of their adversarially trained MNIST model. Attacks were constrained to perturb each pixel of the input image by a scaled maximal $L_\infty$ distortion $ε$ = 0.3. This discourages the use of attacks which are not optimized on the $L_\infty$ distortion metric. Our experimental results demonstrate that by relaxing the $L_\infty$ constraint of the competition, the elastic-net attack to deep neural networks (EAD) can generate transferable adversarial examples which, despite their high average $L_\infty$ distortion, have minimal visual distortion. These results call into question the use of $L_\infty$ as a sole measure for visual distortion, and further demonstrate the power of EAD at generating robust adversarial examples.
Motivation & Objective
- Assess transferability of adversarial attacks beyond the L∞ constraint used in the Madry MNIST challenge.
- Compare EAD (L1+L2 regularization) against PGD and I-FGM (L∞-based) under relaxed distortion budgets.
- Evaluate targeted and non-targeted transferability from undefended models and ensembles.
- Analyze visual distortion vs. distortion metrics (L1, L2, L∞) in transferred adversarial examples.
Proposed method
- Use Madry Defense Model trained with PGD ε = 0.3 on MNIST.
- Generate adversarial examples with EAD (elastic-net: L1 + L2) and tune beta to vary L1/L2 emphasis.
- Compare against PGD and I-FGM under various ε and κ settings.
- Evaluate targeted and non-targeted transferability from undefended models and a 3-model ensemble.
- Analyze visual distortion using L1/L2/L∞ norms and provide qualitative visuals.
Experimental results
Research questions
- RQ1Do L1-based EAD adversarial examples transfer to the Madry Defense Model as effectively as or better than L∞-based PGD/I-FGM?
- RQ2How do L1 and L2 distortions impact transferability and perceived visual distortion compared with L∞ constraints?
- RQ3Does using an ensemble of undefended models enhance transferability of EAD attacks?
- RQ4What are the trade-offs between attack success rate and distortion types (L1/L2/L∞) for targeted vs. non-targeted attacks?
Key findings
- EAD outperforms C&W across κ settings in both targeted and non-targeted cases.
- In targeted attacks, at an optimal κ (e.g., 50), EAD achieves lower L1/L2 distortion and higher transferability than PGD/I-FGM.
- EAD with β=0.01 often yields the highest ASR while minimizing L1 distortion, especially at lower κ.
- PGD/I-FGM can reach high ASR but incur large L1/L2 distortions, leading to visually perceptible perturbations.
- Visual comparisons show EAD can preserve visual quality even with similar average L∞ distortion to PGD.
- Results suggest L∞ alone is insufficient to characterize visual distortion and adversarial subspaces.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.