Skip to main content
QUICK REVIEW

[论文解读] Unrestricted Adversarial Examples via Semantic Manipulation

Anand Bhattad, Min Jin Chong|arXiv (Cornell University)|Apr 12, 2019
Adversarial Robustness in Machine Learning参考文献 46被引用 72
一句话总结

本文提出不受限的、语义上有据可依的对抗攻击,通过操纵颜色(cAdv)和纹理(tAdv),在ImageNet和MSCOCO上产生照片级真实的对抗样例,对防御有效且可跨模型和任务转移,包括字幕生成。

ABSTRACT

Machine learning models, especially deep neural networks (DNNs), have been shown to be vulnerable against adversarial examples which are carefully crafted samples with a small magnitude of the perturbation. Such adversarial perturbations are usually restricted by bounding their $\mathcal{L}_p$ norm such that they are imperceptible, and thus many current defenses can exploit this property to reduce their adversarial impact. In this paper, we instead introduce "unrestricted" perturbations that manipulate semantically meaningful image-based visual descriptors - color and texture - in order to generate effective and photorealistic adversarial examples. We show that these semantically aware perturbations are effective against JPEG compression, feature squeezing and adversarially trained model. We also show that the proposed methods can effectively be applied to both image classification and image captioning tasks on complex datasets such as ImageNet and MSCOCO. In addition, we conduct comprehensive user studies to show that our generated semantic adversarial examples are photorealistic to humans despite large magnitude perturbations when compared to other attacks.

研究动机与目标

  • Motivate and develop unrestricted adversarial perturbations that are semantically meaningful and photorealistic.
  • Demonstrate effectiveness of color-based and texture-based semantic attacks against strong defenses and across large-scale datasets.
  • Show transferability of semantic attacks between models and tasks, including image captioning.
  • Provide user studies validating human perceptual realism of the attacks.
  • Offer insights into which semantic features most influence model predictions to guide robustness research.

提出的方法

  • Develop colorization-based adversarial attack (cAdv) by adversarially altering colorization outputs through network weights, hints, and masks to produce targeted misclassifications.
  • Control attack regions by clustering color space and using entropy to focus perturbations on ambiguous regions.
  • Employ texture transfer attack (tAdv) by optimizing cross-layer Gram matrices from VGG19 to infuse texture from a target image while constraining perceptual realism.
  • Combine texture loss with a cross-entropy adversarial objective to drive misclassification without artistic distortion.
  • Use nearest-neighbor texture source selection to enhance realism and transferability across models.
  • Evaluate attacks on ImageNet and MSCOCO, including white-box and transfer scenarios, and against JPEG defense, feature squeezing, and adversarial training.

实验结果

研究问题

  • RQ1Can unrestricted, semantically grounded perturbations (color and texture) reliably mislead large-scale classifiers and captioning models?
  • RQ2How do cAdv and tAdv compare in realism, attack success, and defense robustness on ImageNet and MSCOCO?
  • RQ3What factors (hints, clusters, texture weight) affect realism, effectiveness, and transferability of semantic attacks?
  • RQ4Do these semantic attacks transfer across architectures and tasks (classification and captioning)?
  • RQ5Are the generated adversarial examples photorealistic to humans according to user studies?

主要发现

MethodRes50JPEG75Feature SqueezingRes152Adv Res152User Pref.
Kurakin et al. (2016)10012.7328.6286.6634.2821.56
Carlini & Wagner (2017)99.8511.5012.0030.5022.0014.50
Xiao et al. (2018b)10017.6122.5129.2628.7123.51
cAdv 110052.3347.7876.1736.2850.50
cAdv 299.8946.6142.7872.5634.2846.45
cAdv 499.8342.6138.3969.6734.3440.78
cAdv 899.8138.2236.6267.0631.6737.67
tAdv 250 199.0032.8962.7989.7454.9438.92
tAdv 250 310036.3367.6894.1158.9242.82
tAdv 1000 199.8831.4952.6990.5251.2434.85
tAdv 1000 310035.2361.4093.1856.3139.66
  • cAdv achieves high targeted attack success across models (e.g., ResNet50, DenseNet121, VGG19) with large, smooth color changes that remain photorealistic.
  • tAdv achieves high white-box attack success and strong transferability by cross-layer texture transfer, while maintaining realism under controlled texture weight and iteration settings.
  • Both attacks degrade robustness of defenses (JPEG defense, feature squeezing, adversarial training) and show transferability across models.
  • Human perceptual studies indicate photorealism of cAdv and tAdv adversarial images comparable to benign images, despite large perturbations.
  • Attacks are extendable to image captioning, capable of altering specific words in generated captions without changing the overall semantic content.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。