QUICK REVIEW

[论文解读] Guided Diffusion Model for Adversarial Purification

Jinyi Wang, Zhaoyang Lyu|arXiv (Cornell University)|May 30, 2022

Adversarial Robustness in Machine Learning被引用 27

一句话总结

该论文提出 GDMP，一种基于扩散模型的对抗性净化方法：对被攻击的图像进行扩散并在对抗输入的引导下进行去噪，在 CIFAR-10 和 ImageNet 上在强攻击下实现显著的鲁棒性准确性提升。

ABSTRACT

With wider application of deep neural networks (DNNs) in various algorithms and frameworks, security threats have become one of the concerns. Adversarial attacks disturb DNN-based image classifiers, in which attackers can intentionally add imperceptible adversarial perturbations on input images to fool the classifiers. In this paper, we propose a novel purification approach, referred to as guided diffusion model for purification (GDMP), to help protect classifiers from adversarial attacks. The core of our approach is to embed purification into the diffusion denoising process of a Denoised Diffusion Probabilistic Model (DDPM), so that its diffusion process could submerge the adversarial perturbations with gradually added Gaussian noises, and both of these noises can be simultaneously removed following a guided denoising process. On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range, thereby significantly improving the correctness of classification. GDMP improves the robust accuracy by 5%, obtaining 90.1% under PGD attack on the CIFAR10 dataset. Moreover, GDMP achieves 70.94% robustness on the challenging ImageNet dataset.

研究动机与目标

在不重新训练目标分类器的情况下，通过缓解对抗性扰动来提高鲁棒图像分类。
引入基于 DDPM 的净化方法，通过扩散将扰动淹没，并通过引导去噪恢复干净内容。
通过将净化输出与对抗输入绑定的引导，实现在去除对抗噪声的同时保持图像语义。
实现对大规模数据集（包括 ImageNet）的可扩展性，并在强自适应攻击下评估鲁棒性。

提出的方法

将对抗性净化嵌入到预训练 DDPM 的扩散-去噪过程中，通过加入高斯噪声来淹没扰动。
对被攻击的图像进行一定长度 Tc 的扩散，以降解对抗性扰动，然后再逆向过程以恢复接近干净的图像。
在逆过程引入引导机制，使输出向对抗性图像靠拢，以在去除扰动的同时保持内容。
通过距离度量 D（MSE 或 SSIM）来形式化引导，衡量中间去噪状态与扩散对抗状态之间的差距，并由随时间变化的 s_t 缩放。
计算 s_t 与高斯噪声幅度成正比、与扰动幅度成反比，以在净化强度和内容保持之间取得平衡。
可选地使用跳跃步技术在不重新训练扩散模型的情况下加速 DDPM。

实验结果

研究问题

RQ1是否可以有效地将预训练的 DDPM 用作净化预处理器，在不重新训练分类器的情况下帮助防御对抗性攻击？
RQ2在 DDPM 的逆过程中的引导来自对抗性图像，是否相比于无引导净化能提高净化质量与语义保持？
RQ3在大规模数据集（如 ImageNet）上，在强对抗攻击（包括自适应攻击）下，该方法的表现如何？
RQ4在净化强度与内容保真之间取得平衡的实际策略是什么（例如多次净化迭代、扩散长度 Tc）？

主要发现

GDMP 在 CIFAR-10 上将鲁棒准确率提高了 5%，在 PGD 攻击下达到 90.1%。
GDMP 在 ImageNet 上在类似 PGD 的评估下实现了 70.94% 的鲁棒性。
带有 SSIM 或 MSE 引导的引导扩散在性能上优于无引导净化，尤其是在使用较大扩散步数或多次迭代时。
使用中等 Tc 的多轮净化迭代在在去除扰动的同时保持内容方面优于单次大 Tc 的净化。
DDPM 加速技术可以显著降低计算时间（如在 ImageNet 上通过重新安排步骤可以快到 4 倍）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。