QUICK REVIEW

[论文解读] Spatially Transformed Adversarial Examples

Chaowei Xiao, Jun-Yan Zhu|arXiv (Cornell University)|Jan 8, 2018

Advanced Malware Detection Techniques被引用 240

一句话总结

该论文提出通过空间变换（流场）生成对抗样本，而不是像素值的改变，显示出高感知真实感和对标准防御的鲁棒性。

ABSTRACT

Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed to generate adversarial examples by leveraging the $\mathcal{L}_p$ distance for penalizing perturbations. Researchers have explored different defense methods to defend against such adversarial attacks. While the effectiveness of $\mathcal{L}_p$ distance as a metric of perceptual quality remains an active research area, in this paper we will instead focus on a different type of perturbation, namely spatial transformation, as opposed to manipulating the pixel values directly as in prior works. Perturbations generated through spatial transformation could result in large $\mathcal{L}_p$ distance measures, but our extensive experiments show that such spatially transformed adversarial examples are perceptually realistic and more difficult to defend against with existing defense systems. This potentially provides a new direction in adversarial example generation and the design of corresponding defenses. We visualize the spatial transformation based perturbation for different examples and show that our technique can produce realistic adversarial examples with smooth image deformation. Finally, we visualize the attention of deep networks with different types of adversarial examples to better understand how these examples are interpreted.

研究动机与目标

推动探索超越Lp像素空间扭曲的扰动.
引入一种几何攻击，在实现错分类的同时最小化局部空间扭曲.
在数据集 MNIST、CIFAR-10、ImageNet 上演示空间变换对抗样本的感知真实感.
在stAdv扰动下分析防御鲁棒性与注意机制。

提出的方法

将对抗扰动表示为逐像素的流场 f，将对抗图像像素通过双线性插值映射回原始图像像素。
定义将错误分类损失与鼓励平滑、局部形变的流正则化项相结合的攻击目标。
对有目标攻击使用 Carlini-Wagner 风格损失，将 g(x_adv) 优化至指定目标类别。
用类似总变差的损失对流场进行正则化，以强制局部平滑的变换（L_flow）。
通过 L-BFGS 配合回溯法优化，获得产生对抗图像的最优流场。
可视化流场以展示局部性和边缘聚焦的畸变。

实验结果

研究问题

RQ1空间变换扰动是否会在保持感知真实感的同时降低分类器准确率？
RQ2stAdv 对标准防御（包括 FGSM、C&W 和对抗训练）表现如何？
RQ3stAdv扰动是否在网络中产生注意力偏移，鲁棒模型又如何应对？

主要发现

stAdv 通过平滑地形变图像而非改变像素值，在 MNIST、CIFAR-10、ImageNet 上产生感知真实的对抗样本。
优化后的流场局部平滑，且常聚焦于对象边缘或对识别关键的区域。
stAdv 达到高的攻击成功率，并对多种防御策略仍具挑战性，包括对抗训练的变体。
CAM 可视化表明 stAdv 能重新引导网络注意力，鲁棒模型仍然容易受到 stAdv 攻击。
均值模糊防御对 stAdv 的保护有限，相较其他攻击，且自适应攻击可以恢复鲁棒性漏洞。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。