QUICK REVIEW

[论文解读] MagNet: a Two-Pronged Defense against Adversarial Examples

Dongyu Meng, Hao Chen|arXiv (Cornell University)|May 25, 2017

Adversarial Robustness in Machine Learning参考文献 23被引用 218

一句话总结

MagNet 通过使用检测器拒绝对抗性输入和重塑器将对抗性样本映射到数据流形，且具有多样性以抵御灰盒攻击，而不改变目标分类器。

ABSTRACT

Deep learning has shown promising results on hard perceptual problems in recent years. However, deep learning systems are found to be vulnerable to small adversarial perturbations that are nearly imperceptible to human. Such specially crafted perturbations cause deep learning systems to output incorrect decisions, with potentially disastrous consequences. These vulnerabilities hinder the deployment of deep learning systems where safety or security is important. Attempts to secure deep learning systems either target specific attacks or have been shown to be ineffective. In this paper, we propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet does not modify the protected classifier or know the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. Different from previous work, MagNet learns to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since it does not rely on any process for generating adversarial examples, it has substantial generalization power. Moreover, MagNet reconstructs adversarial examples by moving them towards the manifold, which is effective for helping classify adversarial examples with small perturbation correctly. We discuss the intrinsic difficulty in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we propose to use diversity to strengthen MagNet. We show empirically that MagNet is effective against most advanced state-of-the-art attacks in blackbox and graybox scenarios while keeping false positive rate on normal examples very low.

研究动机与目标

定义对抗样本及防御的评估指标。
提出一种不修改目标分类器且独立于攻击过程的防御框架。
基于流形学习和自编码器的检测器，用于拒绝或重塑对抗输入。
通过多样性减轻灰盒攻击、提高对自适应攻击者的鲁棒性。

提出的方法

安装一个或多个检测器，利用自编码器重构误差估计到数据流形的距离。
使用基于自编码器重构输入与原始输入在分类器输出之间的 Jensen-Shannon 散度的第二个检测器。
训练一个重塑器（自编码器），将对抗样本映射到流形上，以便被正确分类。
在灰盒威胁模型下通过在运行时在多个多样化自编码器之间随机选择来进行防御。
不依赖对抗样本进行训练；防御任意生成过程。

实验结果

研究问题

RQ1能否在不修改目标分类器的情况下，鲁棒地检测并重塑跨多种攻击方法的对抗输入？
RQ2基于流形的检测（重构误差）和分类器输出的散度作为互补检测器的效果如何？
RQ3自编码器基础的防御中的多样性是否在不增加对正常输入的误报的情况下，提高对灰盒攻击的鲁棒性？

主要发现

检测-重塑框架在不改变受保护分类器的前提下，对多种已知攻击的鲁棒性取得提升。
重塑者（自编码器）将对抗样本向正常数据流形移动，帮助正确分类。
两种检测器（基于重构误差和基于概率散度）在不同攻击类型上互为补充。
防御中的多样性（在运行时随机选择多个自编码器）增强了对灰盒威胁的鲁棒性。
在灰盒设置下防御仍然有效，使攻击者更难构造普适性的对抗输入。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。