QUICK REVIEW

[论文解读] Adversarial and Clean Data Are Not Twins

Zhitao Gong, Wenlu Wang|arXiv (Cornell University)|Apr 17, 2017

Adversarial Robustness in Machine Learning被引用 103

一句话总结

作者训练一个二分类器，将对抗样本与干净图像分离，准确率超过99%，并且在二次攻击下仍保持鲁棒，但在 epsilon 和攻击方法之间存在泛化局限。

ABSTRACT

Adversarial attack has cast a shadow on the massive success of deep neural networks. Despite being almost visually identical to the clean data, the adversarial images can fool deep neural networks into wrong predictions with very high confidence. In this paper, however, we show that we can build a simple binary classifier separating the adversarial apart from the clean data with accuracy over 99%. We also empirically show that the binary classifier is robust to a second-round adversarial attack. In other words, it is difficult to disguise adversarial samples to bypass the binary classifier. Further more, we empirically investigate the generalization limitation which lingers on all current defensive methods, including the binary classifier approach. And we hypothesize that this is the result of intrinsic property of adversarial crafting algorithms.

研究动机与目标

Motivate robust detection of adversarial examples as a preprocessing step independent of the target model.
Demonstrate that a simple binary classifier can separate adversarial from clean data with high accuracy.
Investigate robustness of the detector to secondary adversarial attempts and its generalization limits.
Analyze how adversarial crafting methods affect detection and discuss intrinsic properties of adversarial spaces.

提出的方法

Train a neural classifier f1 on clean data to generate adversarial data X_adv(f1) from X_train and X_test.
Train a binary detector f2 on a mixed dataset of clean and adversarial samples labeled 0 and 1, respectively.
Evaluate f2 on X_test and X_adv(f1)_test to measure separability.
Test second-round adversarial data {X_test, X_adv(f1)_test} advanced by f2 to see if adversaries bypass detection.
Compare detector performance across adversarial methods (FGSM, TGSM, JSMA) and across datasets (MNIST, CIFAR10, SVHN).

实验结果

研究问题

RQ1Can a simple binary classifier reliably distinguish adversarial from clean images across common datasets?
RQ2Is the adversarial detector robust to second-round attacks crafted to bypass it?
RQ3What generalization limitations affect the detector when faced with different epsilon values and adversarial crafting algorithms?

主要发现

The binary classifier achieves accuracy over 99% in separating adversarial from clean data across MNIST, CIFAR10, and SVHN.
The binary detector is robust to second-round adversarial attacks and cannot be bypassed by adversaries aware of the detector.
Detector performance is sensitive to the epsilon hyper-parameter used to generate adversarial data and to the adversarial crafting algorithm.
Adversarial datasets generated by FGSM/TGSM and JSMA can be incompatible, though mixing adversaries (e.g., FGSM and JSMA) improves generalization to both.
Defensive methods like adversarial training and distillation exhibit similar generalization limitations, suggesting an intrinsic property of adversarial spaces.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。