[论文解读] AEPecker: L0 Adversarial Examples are not Strong Enough
本文提出 AEPECKER,一种新型防御系统,通过利用 L0 对抗样本的固有局限性——仅在少数像素上施加大振幅扰动——来检测并校正这些对抗样本。该方法使用孪生网络比较输入图像的预处理版本与原始图像,实现高精度检测与基于图像修复(inpainting)的校正,从而在检测准确率和分类恢复方面表现优异。
Despite the great achievements made by neural networks on tasks such as image classification, they are brittle and vulnerable to adversarial example (AE) attacks, which are crafted by adding human-imperceptible perturbations to inputs in order that a neural-network-based classifier incorrectly labels them. In particular, L0 AEs are a category of widely discussed threats where adversaries are restricted in the number of pixels that they can corrupt. However, our observation is that, while L0 attacks modify as few pixels as possible, they tend to cause large-amplitude perturbations to the modified pixels. We consider this as an inherent limitation of L0 AEs, and thwart such attacks by both detecting and rectifying them. The main novelty of the proposed detector is that we convert the AE detection problem into a comparison problem by exploiting the inherent limitation of L0 attacks. More concretely, given an image I, it is pre-processed to obtain another image I' . A Siamese network, which is known to be effective in comparison, takes I and I' as the input pair to determine whether I is an AE. A trained Siamese network automatically and precisely captures the discrepancies between I and I' to detect L0 perturbations. In addition, we show that the pre-processing technique, inpainting, used for detection can also work as an effective defense, which has a high probability of removing the adversarial influence of L0 perturbations. Thus, our system, called AEPECKER, demonstrates not only high AE detection accuracies, but also a notable capability to correct the classification results.
研究动机与目标
- 解决神经网络对 L0 对抗样本的脆弱性,此类样本通过在少数像素上施加大扰动进行攻击。
- 识别 L0 攻击的根本局限性——像素振幅变化大——作为可检测的特征信号。
- 设计一种检测机制,通过图像比对利用该局限性。
- 设计一种防御机制,不仅可检测,还可通过图像修复技术校正对抗输入。
- 在对抗样本上实现高精度检测与可靠的分类恢复。
提出的方法
- 使用一种保留自然图像结构的图像修复技术,对输入图像 I 进行预处理,生成 I'。
- 将 I 和 I' 作为一对输入孪生神经网络,比较其相似性并检测对抗扰动。
- 利用孪生网络学习图像比对中判别性特征的能力,识别由 L0 攻击引起的差异。
- 将相同的图像修复预处理步骤用作防御机制,通过恢复被污染的像素来去除对抗扰动。
- 端到端训练孪生网络,基于 I–I' 的比较,区分干净图像与 L0 对抗样本。
- 将检测与校正功能整合为统一系统 AEPECKER,实现高检测准确率与分类恢复能力。
实验结果
研究问题
- RQ1L0 对抗样本中固有的大振幅扰动是否可作为可检测的特征信号?
- RQ2孪生网络能否通过比较原始图像与预处理版本,有效检测 L0 对抗样本?
- RQ3用于检测的基于图像修复的预处理步骤是否也能作为有效的防御机制?
- RQ4该系统能否以高可靠性校正由 L0 对抗样本引起的错误分类结果?
- RQ5AEPECKER 在检测与校正性能方面相较于现有防御方法表现如何?
主要发现
- 基于孪生网络的检测器通过利用 L0 对抗样本的大扰动特性,实现了高精度识别。
- 用于检测的预处理步骤——图像修复——能有效去除对抗扰动,构成强有力的防御机制。
- AEPECKER 展现出显著的校正能力,可在许多情况下将模型预测恢复至正确类别。
- 该方法有效利用了 L0 攻击的结构局限性(少数但大振幅的扰动)作为检测线索。
- 该系统在无需重训练目标分类器的前提下,实现了高检测准确率与对 L0 攻击的鲁棒性。
- 检测与校正功能集成于单一框架中,显著提升了实际防御的实用性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。