QUICK REVIEW

[论文解读] Task-generalizable Adversarial Attack based on Perceptual Metric

Muzammal Naseer, Salman Khan|arXiv (Cornell University)|Nov 22, 2018

Adversarial Robustness in Machine Learning参考文献 28被引用 30

一句话总结

本文提出了一种任务通用的对抗性攻击方法，通过在深度特征空间中最大化感知失真来生成高度可迁移的扰动，具体利用 VGG-16 的内部表征。与任务特定的攻击不同，该方法在不依赖任务相关损失函数或标签的情况下，实现了在分类、目标检测和语义分割任务中的强可迁移性。

ABSTRACT

Deep neural networks (DNNs) can be easily fooled by adding human imperceptible perturbations to the images. These perturbed images are known as `adversarial examples' and pose a serious threat to security and safety critical systems. A litmus test for the strength of adversarial examples is their transferability across different DNN models in a black box setting (i.e. when the target model's architecture and parameters are not known to attacker). Current attack algorithms that seek to enhance adversarial transferability work on the decision level i.e. generate perturbations that alter the network decisions. This leads to two key limitations: (a) An attack is dependent on the task-specific loss function (e.g. softmax cross-entropy for object recognition) and therefore does not generalize beyond its original task. (b) The adversarial examples are specific to the network architecture and demonstrate poor transferability to other network architectures. We propose a novel approach to create adversarial examples that can broadly fool different networks on multiple tasks. Our approach is based on the following intuition: "Perpetual metrics based on neural network features are highly generalizable and show excellent performance in measuring and stabilizing input distortions. Therefore an ideal attack that creates maximum distortions in the network feature space should realize highly transferable examples". We report extensive experiments to show how adversarial examples generalize across multiple networks for classification, object detection and segmentation tasks.

研究动机与目标

解决现有对抗性攻击在不同深度学习架构和视觉任务之间可迁移性有限的问题。
克服当前攻击方法对任务特定损失函数（如交叉熵）的依赖，从而限制其在分类任务之外的泛化能力。
开发一种仅基于特征空间失真的无监督对抗性攻击，实现广泛适用性。
证明预训练网络特征中的感知失真可生成在不同任务和架构间具有高可迁移性的对抗样本。

提出的方法

该攻击在 VGG-16 的特定层（conv3.3）中最大化原始特征图与对抗性特征图之间的神经表征失真（NRD）。
NRD 计算为原始特征与扰动后特征之间的均方差，确保可微性和稳定性。
在 $l_∞$ 范数约束（$\leq \epsilon$）下优化扰动，以保持不可察觉性。
该攻击在白盒设置下应用于源模型（VGG-16），然后无需微调即可迁移到目标模型。
该方法不使用任何任务特定损失或标签，因此为无监督且与架构无关的优化目标。
该方法利用了基于 VGG 的感知度量与人类感知高度一致且在任务间具有良好泛化性的事实。

实验结果

研究问题

RQ1通过在深度特征空间中最大化感知失真生成的对抗样本，是否能在不同视觉任务之间实现高可迁移性？
RQ2与最先进方法（如 FGSM、MI-FGSM、DIM）相比，该攻击在未见模型和任务上的可迁移性如何？
RQ3当应用于训练用于目标检测和语义分割的模型时，该攻击是否依然有效，而不仅限于分类任务？
RQ4在黑盒设置下，输入变换（如 TVM、JPEG）在多大程度上能缓解该攻击？
RQ5为何该攻击基于 VGG-16 特征时表现更优，尽管 VGG 在 ImageNet 上的准确率相对较低？

主要发现

NRDM 攻击将 IncRes-v2 在 ImageNet 上的 top-1 准确率从 100.0% 降低至 12.7%，表明其可迁移性优于其他攻击方法。
在 MS-COCO 数据集上，该攻击在 $l_∞ \leq 16$ 条件下将 RetinaNet 的 mAP 从 53.78% 降低至 5.16%，显示出对目标检测任务的强可迁移性。
在 CAMVID 数据集的语义分割任务中，该攻击在相同扰动预算下使 Segnet-Basic 的像素级准确率降低了 47.11%。
即使目标模型不属于同一架构族（例如从 VGG-16 迁移到 Inception-ResNet-v2），该攻击仍表现出高可迁移性。
TVM 和中值滤波等输入变换提供了部分缓解效果，但代价是干净样本上的准确率下降。
该攻击对自然训练的模型依然有效，但对在 MNIST 和 CIFAR-10 上对抗训练的 Madry 模型失效，凸显了改进防御策略的必要性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。