QUICK REVIEW

[論文レビュー] Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Jonathan Uesato, Brendan O’Donoghue|arXiv (Cornell University)|Feb 15, 2018

Adversarial Robustness in Machine Learning参考文献 40被引用数 304

ひとこと要約

要約: この論文は、報告される敵対的頑健性がしばしば代替攻撃に依存していると主張し、敵対的リスクを形式化し、不透明性を導入し、多くの防御がより強力な攻撃に対して脆弱であることを示しています。

ABSTRACT

This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate 'adversarial risk' as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as 'obscurity to an adversary,' and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.

研究の動機と目的

Motivate adversarial risk as a worst-case performance measure.
Show that common evaluation metrics are surrogates for true adversarial risk.
Introduce obscurity as a way to diagnose defenses relying on weak attacks.
Demonstrate through experiments that many defenses fail under stronger attacks.

提案手法

Formalize adversarial risk as a worst-case risk over inputs.
Define local adversarial risk L via a neighborhood Nε(x) and surrogate risk Ĺ with a chosen adversary f.
Define obscurity(θ, f) = L(θ) − Ĺ(θ, f) and discuss transparency.
Describe gradient-based (PGD) and gradient-free (SPSA) attack strategies to evaluate robustness.
Analyze transfer-based attacks and non-differentiable defenses for obscurity effects.
Compare defenses by evaluating against stronger adversaries to reveal true robustness.

実験結果

リサーチクエスチョン

RQ1How well do surrogate adversarial evaluation metrics reflect true adversarial risk?
RQ2To what extent do defenses rely on obscurity rather than genuine robustness?
RQ3Can stronger or gradient-free attacks reveal weaknesses in defenses that pass standard evaluations?
RQ4How do non-differentiable transformations, generative-model defenses, and adversarial training fare under stronger attacks?

主な発見

Many defenses that show strong performance against standard attacks remain vulnerable to stronger or gradient-free attacks.
Obscurity is a significant factor; higher surrogate performance does not guarantee low true adversarial risk.
Gradient-based attacks may fail on non-differentiable defenses, yet gradient-free methods can uncover adversarial examples.
PixelDefend, autoencoder purifications, and stochasticity-based defenses can be defeated by stronger adversaries.
Adversarial training reduces obscurity and improves true robustness, unlike several obscurity-prone defenses.
The paper demonstrates that stronger attacks can reduce accuracies of several defenses to near zero.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。