QUICK REVIEW

[Paper Review] Structured Adversarial Attack: Towards General Implementation and Better Interpretability

Kaidi Xu, Sijia Liu|arXiv (Cornell University)|Aug 5, 2018

Adversarial Robustness in Machine Learning104 citations

TL;DR

Introduces Structured Adversarial Attack (StrAttack) that enforces group sparsity in perturbations via a sliding mask and ADMM, achieving competitive distortion with interpretable, structured perturbations.

ABSTRACT

When generating adversarial examples to attack deep neural networks (DNNs), Lp norm of the added perturbation is usually used to measure the similarity between original image and adversarial example. However, such adversarial attacks perturbing the raw input spaces may fail to capture structural information hidden in the input. This work develops a more general attack model, i.e., the structured attack (StrAttack), which explores group sparsity in adversarial perturbations by sliding a mask through images aiming for extracting key spatial structures. An ADMM (alternating direction method of multipliers)-based framework is proposed that can split the original problem into a sequence of analytically solvable subproblems and can be generalized to implement other attacking methods. Strong group sparsity is achieved in adversarial perturbations even with the same level of Lp norm distortion as the state-of-the-art attacks. We demonstrate the effectiveness of StrAttack by extensive experimental results onMNIST, CIFAR-10, and ImageNet. We also show that StrAttack provides better interpretability (i.e., better correspondence with discriminative image regions)through adversarial saliency map (Papernot et al., 2016b) and class activation map(Zhou et al., 2016).

Motivation & Objective

Explore group-sparsity in adversarial perturbations to capture spatial structures within images.
Develop a general and efficient optimization framework for structured attacks.
Show that StrAttack preserves traditional distortion measures while producing sparsely structured perturbations.
Demonstrate interpretability of perturbations via saliency maps and class activation maps.
Evaluate robustness of StrAttack across datasets and against defenses.

Proposed method

Define a sliding mask to partition the perturbation into groups and impose group-sparsity via a group Lasso-like regularizer g(Δ).
Formulate a general attack objective that includes a loss term, a distortion term, and the group-sparsity term; connect it to C&W and EAD as special cases.
Solve the resulting nonconvex problem efficiently with ADMM, introducing auxiliary variables to enable closed-form updates (e.g., Δ-step, z-step, y-step, and w-step).
Use a linearized ADMM variant with a Bregman divergence to handle the nonconvex loss f(x0+z) and to obtain a closed-form z-update.
Extend to overlapping group structures with multiple y-variables and modify the ADMM steps accordingly.
Provide a refinement mechanism that fixes a sparse perturbation pattern and fine-tunes values under the original objective.

Experimental results

Research questions

RQ1Can structured (group-sparse) perturbations identify minimally sufficient regions that mislead DNNs without increasing pixel-wise distortion?
RQ2Does StrAttack generalize existing norm-ball attacks (e.g., C&W, EAD) and improve interpretability of perturbations?
RQ3How can ADMM be leveraged to efficiently generate structured adversarial perturbations, including overlapping groups?
RQ4Can StrAttack reveal clearer correspondences between perturbed regions and discriminative image regions via ASM and CAM?
RQ5Is StrAttack effective against defenses and across large-scale datasets (MNIST, CIFAR-10, ImageNet) and across models?

Key findings

StrAttack yields strong group sparsity in perturbations while maintaining comparable ℓp distortion to state-of-the-art attacks.
StrAttack perturbations highlight minimally sufficient regions, often aligning with semantic structures of the target object.
Overlapping group structures are feasible and can yield even sparser perturbations under the same distortion constraints.
The ADMM-based solver provides closed-form updates and parallelizable steps, improving efficiency and generality over prior methods.
StrAttack demonstrates interpretability improvements via adversarial saliency maps and class activation maps compared with non-structured attacks.
StrAttack remains effective against defenses (defensive distillation and adversarial training) and shows strong transferability across multiple network architectures.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.