[Paper Review] Big but Imperceptible Adversarial Perturbations via Semantic Manipulation.
This paper proposes a novel class of adversarial perturbations that manipulate semantic image attributes—color and texture—without restricting perturbation magnitude, enabling photorealistic, large-magnitude adversarial examples. Unlike traditional $$\mathcal{L}_p$-bounded attacks, these semantic perturbations remain imperceptible to humans and effectively evade defenses like JPEG compression, feature squeezing, and adversarially trained models on ImageNet and MSCOCO.
Machine learning models, especially deep neural networks (DNNs), have been shown to be vulnerable against adversarial examples which are carefully crafted samples with a small magnitude of the perturbation. Such adversarial perturbations are usually restricted by bounding their $\mathcal{L}_p$ norm such that they are imperceptible, and thus many current defenses can exploit this property to reduce their adversarial impact. In this paper, we instead introduce unrestricted perturbations that manipulate semantically meaningful image-based visual descriptors - color and texture - in order to generate effective and photorealistic adversarial examples. We show that these semantically aware perturbations are effective against JPEG compression, feature squeezing and adversarially trained model. We also show that the proposed methods can effectively be applied to both image classification and image captioning tasks on complex datasets such as ImageNet and MSCOCO. In addition, we conduct comprehensive user studies to show that our generated semantic adversarial examples are photorealistic to humans despite large magnitude perturbations when compared to other attacks.
Motivation & Objective
- To address the limitation of current adversarial attacks that rely on small, $$\mathcal{L}_p$-bounded perturbations, which are vulnerable to defenses exploiting their small magnitude.
- To explore whether semantically meaningful image descriptors—color and texture—can be manipulated to create adversarial examples that are large in magnitude yet imperceptible to humans.
- To develop a method that generates photorealistic adversarial examples effective against robust defenses such as JPEG compression, feature squeezing, and adversarial training.
- To evaluate the effectiveness of semantic perturbations across diverse tasks, including image classification and image captioning, on complex datasets like ImageNet and MSCOCO.
- To validate human perceptual similarity through user studies, demonstrating that large-magnitude perturbations remain visually natural and realistic.
Proposed method
- The method formulates adversarial attacks by optimizing perturbations in the space of semantic image descriptors—specifically color and texture—rather than raw pixel space.
- It uses a differentiable image transformation pipeline to manipulate color histograms and texture patterns in a way that preserves photorealism while maximizing model misclassification.
- The attack framework is designed to maximize the cross-entropy loss on the target model while constraining the perturbation to be semantically plausible using perceptual similarity metrics.
- The approach is applied end-to-end to both image classification and image captioning models, enabling transferability across tasks and datasets.
- A user study is conducted to evaluate perceptual similarity, comparing human judgments of original and perturbed images to assess realism and imperceptibility.
- The method is evaluated against defenses including JPEG compression (with varying quality), feature squeezing (via spatial and color preprocessing), and adversarially trained models.
Experimental results
Research questions
- RQ1Can large-magnitude adversarial perturbations that manipulate semantic attributes like color and texture remain imperceptible to humans?
- RQ2How effective are semantic adversarial perturbations against robust defenses such as JPEG compression and feature squeezing?
- RQ3To what extent can semantic perturbations transfer across different models and tasks, including image classification and image captioning?
- RQ4How do semantic adversarial examples compare to standard $$\mathcal{L}_p$-bounded attacks in terms of human perceptual realism and model evasion capability?
- RQ5Can semantic manipulation generate adversarial examples that maintain high photorealism even when perturbation magnitude exceeds typical adversarial bounds?
Key findings
- The proposed semantic adversarial perturbations achieve high attack success rates on both standard and robust models, including those trained with adversarial defense techniques.
- The attacks remain effective against JPEG compression and feature squeezing, demonstrating resilience to common preprocessing defenses.
- User studies confirm that the generated adversarial examples are perceived as photorealistic and indistinguishable from original images by human observers, despite large perturbation magnitudes.
- The method achieves strong transferability across models and tasks, showing effectiveness on both ImageNet and MSCOCO datasets for image classification and image captioning.
- The attacks are significantly more robust than standard $$\mathcal{L}_p$-bounded attacks when evaluated under common defense mechanisms.
- The semantic manipulation of color and texture enables adversarial examples that are both highly effective and visually natural, challenging the assumption that imperceptibility requires small perturbations.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.