[论文解读] Perceptual Adversarial Robustness: Defense Against Unseen Threat Models
本文提出感知对抗训练(Perceptual Adversarial Training, PAT),通过使用神经感知距离(LPIPS)作为人类感知的代理,训练模型以抵御所有不可察觉的对抗攻击。PAT 在 CIFAR-10 和 ImageNet-100 上对五类未见过的攻击(L₂、L∞、空间变换、重新着色、JPEG)实现了最先进(SOTA)的鲁棒性——准确率翻倍以上,且在训练过程中未接触任何此类攻击,展示了对未预见威胁模型的强大泛化能力。
A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception, used in the very definition of adversarial attacks that are imperceptible to human eyes. Most current attacks and defenses try to avoid this issue by considering restrictive adversarial threat models such as those bounded by $L_2$ or $L_\infty$ distance, spatial perturbations, etc. However, models that are robust against any of these restrictive threat models are still fragile against other threat models. To resolve this issue, we propose adversarial training against the set of all imperceptible adversarial examples, approximated using deep neural networks. We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images. Through an extensive perceptual study, we show that the neural perceptual distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Under the NPTM, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against five diverse adversarial attacks. We find that PAT achieves state-of-the-art robustness against the union of these five attacks, more than doubling the accuracy over the next best model, without training against any of them. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial training defense with this property.
研究动机与目标
- 解决对抗鲁棒性研究中人类感知缺乏精确数学表征的问题。
- 克服受限威胁模型(如 L₂、L∞)无法泛化至未见过的攻击类型的问题。
- 通过建模感知威胁模型,开发一种能在多种未预见扰动类型间实现鲁棒性泛化的防御方法。
- 验证神经感知距离(LPIPS)与人类感知的相关性,以支持其作为可扩展对抗训练中真实感知距离的代理。
- 证明在神经感知威胁模型(NPTM)下进行对抗训练,可实现对目标攻击与非目标攻击的强泛化能力,包括常见的自然损坏。
提出的方法
- 将感知对抗威胁模型定义为所有对人类不可察觉的扰动,使用真实感知距离 d* 进行形式化。
- 使用基于深度网络激活的可学习感知相似性度量 LPIPS,近似难以计算的真实感知距离 d*。
- 提出神经感知威胁模型(NPTM),包含所有与自然图像在 LPIPS 距离内有界的对抗样本。
- 开发新型感知对抗攻击,使用基于 LPIPS 约束的投影梯度下降(PGD)算法,生成不可察觉的对抗样本。
- 利用这些感知攻击进行对抗训练,从而实现感知对抗训练(PAT)。
- 使用自监督和预训练模型(如 AlexNet)计算攻击与防御中的 LPIPS,实现可迁移的鲁棒性。
实验结果
研究问题
- RQ1在训练过程中未接触过的未见对抗攻击类型中,基于广泛感知威胁模型训练的防御方法是否能实现泛化?
- RQ2与传统 Lp 范数相比,LPIPS 距离在多大程度上与人类对图像扰动的感知相关?
- RQ3在神经感知威胁模型(NPTM)下进行对抗训练,是否比在 L₂ 或 L∞ 约束下的标准对抗训练产生更好的鲁棒性?
- RQ4PAT 是否能泛化至自然损坏(如模糊、噪声、天气变化)等未在训练中明确针对的损坏类型?
- RQ5与标准对抗训练方法相比,使用 PAT 是否存在清晰的干净准确率与鲁棒性之间的权衡?
主要发现
- PAT 在 CIFAR-10 上实现了最先进(SOTA)的鲁棒性,相较于次优模型,对五类多样化攻击(L₂、L∞、空间变换、重新着色、JPEG)的联合攻击,准确率翻倍以上,且训练过程中未接触任何此类攻击。
- 在 CIFAR-10-C 上,PAT 的相对平均损坏误差(mCE)为 0.50(PAT-self)和 0.49(PAT-AlexNet),显著低于 L₂ 对抗训练(0.54)和 L∞ 对抗训练(0.57)。
- 在 ImageNet-100-C 上,PAT 的相对 mCE 为 0.37(PAT-self)和 0.39(PAT-AlexNet),优于 L₂(0.41)和 L∞(0.42)对抗训练,所有损坏类型中均表现更优,仅在 'noise' 类型中 L₂ 表现最佳,因其分布对称。
- 通过感知研究验证,LPIPS 测量的感知距离与人类感知高度相关,支持其作为真实感知距离代理的合理性。
- PAT 将鲁棒性泛化至自然损坏,表明对最坏情况感知扰动的鲁棒性,也赋予其对随机真实世界失真的鲁棒性。
- PAT 在保持高干净准确率(如 CIFAR-10 上达 93.4%)的同时,实现了卓越的鲁棒性,相较于先前方法,展现出更优的准确率与鲁棒性权衡。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。