QUICK REVIEW

[Paper Review] Improving Adversarial Robustness of Ensembles with Diversity Training

Sanjay Kariyappa, Moinuddin K. Qureshi|arXiv (Cornell University)|Jan 28, 2019

Adversarial Robustness in Machine Learning28 references99 citations

TL;DR

The paper introduces Diversity Training (DivTrain) which trains ensembles with uncorrelated loss gradients using a Gradient Alignment Loss (GAL) to reduce shared adversarial subspaces and enhance transfer-based robustness, optionally combined with Ensemble Adversarial Training.

ABSTRACT

Deep Neural Networks are vulnerable to adversarial attacks even in settings where the attacker has no direct access to the model being attacked. Such attacks usually rely on the principle of transferability, whereby an attack crafted on a surrogate model tends to transfer to the target model. We show that an ensemble of models with misaligned loss gradients can provide an effective defense against transfer-based attacks. Our key insight is that an adversarial example is less likely to fool multiple models in the ensemble if their loss functions do not increase in a correlated fashion. To this end, we propose Diversity Training, a novel method to train an ensemble of models with uncorrelated loss functions. We show that our method significantly improves the adversarial robustness of ensembles and can also be combined with existing methods to create a stronger defense.

Motivation & Objective

Motivate robust deployment of deep nets against transfer-based (black-box) attacks.
Propose a differentiable measure to quantify overlap in adversarial subspaces across an ensemble.
Introduce Gradient Alignment Loss (GAL) as a regularizer to train diverse ensembles.
Demonstrate that DivTrain lowers shared adversarial subspace and improves robustness, possibly with existing defenses.
Show that combining DivTrain with other defenses yields stronger protection.

Proposed method

Define adversarial subspace and the transferability threat model for ensembles.
Propose Gradient Alignment Loss (GAL) to quantify gradient alignment across ensemble members via a smooth approximation of coherence.
Train ensembles with GAL as a regularizer: Loss = average cross-entropy + lambda * GAL.
Use Leaky-ReLU to mitigate sparse gradient issues in GAL computation.
Evaluate DivTrain on MNIST and CIFAR-10 against multiple black-box attacks (FGSM, R-FGSM, I-FGSM, MI-FGSM, PGD-CW).
Demonstrate that DivTrain lowers gradient coherence and reduces adversarial subspace overlap, and can improve robustness when combined with Ensemble Adversarial Training.

Experimental results

Research questions

RQ1Does reducing overlap in the adversarial subspace of ensemble members improve robustness to transfer-based attacks?
RQ2Can gradient alignment (GAL) be used as a differentiable regularizer to train diverse ensembles?
RQ3How does DivTrain interact with existing defenses like Ensemble Adversarial Training?
RQ4What is the impact of gradient sparsity on GAL and how can activation choices mitigate it?

Key findings

Diverse ensembles trained with GAL show higher accuracy on adversarial examples than baseline ensembles across all evaluated attacks.
Combining DivTrain with Ensemble Adversarial Training yields even greater robustness than either method alone.
DivTrain and DivTrain+EnsAdvTrain produce lower coherence (gradient alignment) distributions than baseline/Ens ensembles, indicating reduced overlap in Adversarial Subspace.
GAAS analysis shows DivTrain reduces the dimensionality of the ensemble's adversarial subspace, decreasing the likelihood of finding multiple orthogonal adversarial directions.
Using Leaky-ReLU mitigates gradient sparsity issues that hamper GAL backpropagation.
DivTrain maintains competitive clean accuracy with a tunable trade-off controlled by lambda.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.