QUICK REVIEW

[论文解读] Robust Physical-World Attacks on Deep Learning Models

Kevin Eykholt, Ivan Evtimov|arXiv (Cornell University)|Jul 27, 2017

Adversarial Robustness in Machine Learning参考文献 39被引用 506

一句话总结

本论文提出 Robust Physical Perturbations (RP2) 以在物理对象上产生扰动，在不同观看距离和角度下导致 DNN 的定向错分类，并在路标和其他对象的实验室与现场测试中对其进行评估。

ABSTRACT

Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm,Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. Witha perturbation in the form of only black and white stickers,we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8%of the captured video frames obtained on a moving vehicle(field test) for the target classifier.

研究动机与目标

证明在现实世界的动态条件下，物理扰动可以可靠地误导 DNN 分类器。
研发 RP2，使扰动对距离、角度和光照的变化具有鲁棒性。
提出一个面向物理对抗样本的实验室-现场两阶段评估方法。
在标准路标分类器上评估扰动并展示对其他对象的泛化。

提出的方法

建模物理变换的分布（距离、角度、光照），并对真实和合成变体进行采样以优化扰动。
使用蒙版 Mx 将扰动限制在目标对象表面，并通过 Ti 投影扰动以与对象变换对齐。
在优化目标中加入 Non-Printability Score（NPS），以考虑打印机颜色再现误差。
通过求解带 Lp 正则化的放松目标并对变换实例的期望进行优化来优化扰动：argmin_delta lambda||Mx·delta||p + NPS + E_{xi~XV} J(f_theta(xi + Ti(Mx·delta)), y*) 。
使用 ADAM 优化，并将扰动设计为贴在停止标志上的黑白贴纸或涂鸦风格海报。

实验结果

研究问题

RQ1现实物理对象上的扰动在不同距离和观察角度范围内能否实现定向错分类？
RQ2鲁棒的、表面受限的扰动是否能在环境变异和制造限制下保持效力？
RQ3实验室（静态）与现场（行驶中）测试在评估物理对抗扰动方面有何差异？
RQ4RP2 扰动是否可迁移到其他分类器和路标以外的对象？
RQ5扰动类型（海报 vs. 贴纸）对攻击成功率和可见性的影响？

主要发现

RP2 扰动在静态 Stop signs 上对 LISA-CNN 使用海报攻击时实现了 100% 的定向成功。
行驶测试中，使用伪装涂鸦在 LISA-CNN 的定向成功率为 84.8%，在 GTSRB-CNN 的行驶帧上为 87.5%。
海报和贴纸攻击在实验室测试中在距离达 40 英尺、角度达 60 度的范围内仍保持高定向成功率。
在 Inception-v3 上，贴纸攻击将微波炉错误分类为手机，定向成功率为 90%，将咖啡杯错误分类为现金机，定向成功率为 71.4%。
GTSRB-CNN 在静态测试中 Stop 与 Speed Limit 80 的定向成功率为 80%，在 drive-by 测试中为 87.5%。
该方法可推广到路标以外的其他对象，显示图像分类器对鲁棒物理扰动具有广泛易感性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。