QUICK REVIEW

[论文解读] Adversarial Patch

T. B. Brown, Dandelion Mané|arXiv (Cornell University)|Dec 27, 2017

Adversarial Robustness in Machine Learning被引用 78

一句话总结

论文提出一种方法，生成通用、鲁棒、面向目标的对抗性图像补丁，可以打印并放置在任意场景中，强制分类器输出所选目标类别，而无需背景信息。它在 Expectation over Transformations 框架下训练补丁，以在各种变换和位置下保持有效，从而实现现实世界攻击。

ABSTRACT

We present a method to create universal, robust, targeted adversarial image patches in the real world. The patches are universal because they can be used to attack any scene, robust because they work under a wide variety of transformations, and targeted because they can cause a classifier to output any target class. These adversarial patches can be printed, added to any scene, photographed, and presented to image classifiers; even when the patches are small, they cause the classifiers to ignore the other items in the scene and report a chosen target class. To reproduce the results from the paper, our code is available at https://github.com/tensorflow/cleverhans/tree/master/examples/adversarial_patch

研究动机与目标

激发在物理世界中将大而并非不可察觉的扰动作为对抗性攻击进行研究。
提出一种基于补丁的攻击，在背景和变换之间具有通用性。
开发一个优化框架，用于训练能够在多种条件下诱导出目标类别的补丁。
展示在现实世界环境中打印和部署能够欺骗多种模型的补丁的可行性。

提出的方法

用运算符 A(p,x,l,t) 定义一个对图像 x 独立于图像的补丁 p，应用于位置 l，带有补丁变换 t。
通过最大化目标类别 ŷ 在随机图像、补丁变换和位置上的对数概率的期望来训练补丁：p̂ = arg max_p E_{x∼X,t∼T,l∼L}[log Pr(ŷ|A(p,x,l,t))]。
使用 Expectation over Transformations (EOT) 框架来提升对背景无关的有效性。
通过在 L∞ 范数下将 p 限制为接近原始补丁 p_orig，使补丁能够被伪装。
在多个 ImageNet 模型上，以白盒和黑盒设置评估补丁。
通过打印补丁并在真实场景中测试来演示物理世界中的迁移效果。

实验结果

研究问题

RQ1一个单一的、通用补丁是否能在不同背景、位置和变换下欺骗多种分类器？
RQ2在通用的物理世界设置中，补丁需要多大才能可靠地诱导出目标类别？
RQ3补丁在伪装或变换后是否仍然有效，以及在转移到未见模型或现实世界应用时是否仍然有效？

主要发现

使用所提出方法训练的单一补丁可以在不同模型和场景中使分类器输出所选的目标类别。
补丁在随机平移、旋转和缩放下仍然有效，甚至当放置在多样的背景上时也如此。
伪装的补丁（例如扎染图案）对目标分类器仍具有相当的攻击力。
物理世界实验表明，打印的补丁在现实场景中有其他物体存在时也能欺骗分类器。
观察到了黑盒转移性，尽管有效性可能取决于补丁大小和可见性。
研究指出，针对小扰动的防御可能不足以对抗大而局部化的补丁。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。