QUICK REVIEW

[Paper Review] Cascade Adversarial Machine Learning Regularized with a Unified Embedding

Taesik Na, Jong Hwan Ko|arXiv (Cornell University)|Aug 8, 2017

Adversarial Robustness in Machine Learning14 references62 citations

TL;DR

This paper introduces cascade adversarial training using iteratively generated adversarial images from defended networks, combined with low-level embedding similarity regularization to improve robustness to unknown iterative attacks and black-box scenarios, at the cost of some clean accuracy loss.

ABSTRACT

Injecting adversarial examples during training, known as adversarial training, can improve robustness against one-step attacks, but not for unknown iterative attacks. To address this challenge, we first show iteratively generated adversarial images easily transfer between networks trained with the same strategy. Inspired by this observation, we propose cascade adversarial training, which transfers the knowledge of the end results of adversarial training. We train a network from scratch by injecting iteratively generated adversarial images crafted from already defended networks in addition to one-step adversarial images from the network being trained. We also propose to utilize embedding space for both classification and low-level (pixel-level) similarity learning to ignore unknown pixel level perturbation. During training, we inject adversarial images without replacing their corresponding clean images and penalize the distance between the two embeddings (clean and adversarial). Experimental results show that cascade adversarial training together with our proposed low-level similarity learning efficiently enhances the robustness against iterative attacks, but at the expense of decreased robustness against one-step attacks. We show that combining those two techniques can also improve robustness under the worst case black box attack scenario.

Motivation & Objective

Motivate robustness gaps against unknown iterative adversaries beyond one-step attacks.
Propose cascade adversarial training that transfers end results of adversarial training from defended networks.
Introduce low-level embedding regularization to ignore pixel-level perturbations during training.
Evaluate the approach on MNIST and CIFAR-10 with ResNet architectures.
Analyze transferability, embedding space, and robustness under white-box and black-box attacks.

Proposed method

Demonstrate transferability of iteratively generated adversarial images between networks trained with the same strategy.
Develop cascade adversarial training: inject iter_FGSM images crafted from an already defended network alongside one-step adversarial images from the network being trained.
Introduce low-level similarity learning by including clean images in mini-batches and penalizing the distance between clean and adversarial embeddings (L_dist).
Explore two embedding regularization variants: bidirectional loss and pivot loss.
Define the total loss as a combination of standard classification loss on clean/adversarial images and the embedding distance loss with hyperparameters lambda and lambda2.
Visualize embedding spaces to show reduced divergence between clean and adversarial embeddings, and study the impact of lambda2 on performance.
Evaluate on MNIST and CIFAR-10 using ResNet backbones, with analysis of white-box and black-box attack scenarios.

Experimental results

Research questions

RQ1How transferable are iteratively generated adversarial examples between networks trained with the same strategy?
RQ2Can cascade adversarial training improve robustness to iterative adversarial attacks without excessively sacrificing clean accuracy?
RQ3Does embedding-based regularization (low-level similarity) enhance robustness to perturbations at the pixel level?
RQ4How does the proposed approach perform under white-box versus black-box attack settings on MNIST and CIFAR-10?
RQ5What is the trade-off between robustness to iterative attacks and accuracy on clean data when combining cascade training with embedding regularization?

Key findings

Cascade adversarial training using iter_FGSM from defended networks improves robustness against unknown iterative attacks, with a tendency to reduce robustness against one-step attacks.
Low-level similarity learning regularizes embeddings so that small input perturbations produce closer high-level representations, enhancing robustness on simple datasets like MNIST.
Pivot loss and bidirectional embedding losses effectively regularize adversarial perturbations, with pivot loss particularly helping reduce embedding divergence.
When combined with cascade/ensemble training, the approach yields better worst-case robustness under black-box attacks compared to single-method adversarial training.
There is a trade-off: increasing robustness to iterative attacks can come at the expense of clean image accuracy, and the effect is dataset- and architecture-dependent; using the same initialization for cascade/source networks is recommended to maximize transfer benefits.
Ensemble and cascade strategies together with low-level similarity learning improve robustness against iterative white-box and black-box attacks on CIFAR-10, though challenges remain for fully preserving clean accuracy.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.