QUICK REVIEW

[论文解读] Divide, Denoise, and Defend against Adversarial Attacks

Seyed-Mohsen Moosavi-Dezfooli, Ashish Shrivastava|arXiv (Cornell University)|Feb 19, 2018

Adversarial Robustness in Machine Learning参考文献 51被引用 27

一句话总结

本文提出D3，一种非可微、与攻击无关的防御方法，通过将输入图像划分为重叠的图像块，利用基于学习的干净图像块字典，采用非可微的匹配追踪算法对每个图像块进行去噪，并重建图像。在白盒FGSM攻击下，D3在ImageNet上实现了34.4%的鲁棒准确率，显著优于此前报告为0%准确率的工作，展示了无需对抗性微调即可实现最先进鲁棒性的能力。

ABSTRACT

Deep neural networks, although shown to be a successful class of machine learning algorithms, are known to be extremely unstable to adversarial perturbations. Improving the robustness of neural networks against these attacks is important, especially for security-critical applications. To defend against such attacks, we propose dividing the input image into multiple patches, denoising each patch independently, and reconstructing the image, without losing significant image content. We call our method D3. This proposed defense mechanism is non-differentiable which makes it non-trivial for an adversary to apply gradient-based attacks. Moreover, we do not fine-tune the network with adversarial examples, making it more robust against unknown attacks. We present an analysis of the tradeoff between accuracy and robustness against adversarial attacks. We evaluate our method under black-box, grey-box, and white-box settings. On the ImageNet dataset, our method outperforms the state-of-the-art by 19.7% under grey-box setting, and performs comparably under black-box setting. For the white-box setting, the proposed method achieves 34.4% accuracy compared to the 0% reported in the recent works.

研究动机与目标

解决深度神经网络在安全关键应用中对对抗性扰动的脆弱性问题。
开发一种对未知梯度攻击具有鲁棒性的防御机制，且无需对抗性微调。
在通过降维和非可微去噪提升鲁棒性的同时，保持高干净图像准确率。
通过使变换过程不可微，设计一种对梯度攻击天然具有抵抗能力的防御机制。
分析在黑盒、灰盒和白盒攻击设置下，干净准确率与鲁棒性之间的权衡。

提出的方法

将输入图像划分为重叠的图像块，以降低有效维度并限制攻击者的搜索空间。
使用一种匹配追踪（MP）的变体对每个图像块独立进行去噪，其字典由通过新型图像块选择算法选出的干净图像块构成。
字典构建为包含多样化、显著的图像块，且原子之间具有较高的最小夹角距离，以增强鲁棒性。
去噪过程为非可微，可防止基于梯度的反向传播攻击（如FGSM和BPDA）。
在字典选择过程中引入随机化，以进一步提升白盒设置下的鲁棒性。
通过重建步骤将去噪后的图像块组合成最终图像，保留语义内容的同时去除对抗性噪声。

实验结果

研究问题

RQ1非可微、基于图像块的去噪防御是否能在ImageNet等大规模数据集上实现最先进鲁棒性？
RQ2在图像块大小、稀疏度和字典特性变化下，干净准确率与鲁棒性之间的权衡如何变化？
RQ3当攻击者知晓网络结构和防御机制时，该防御在白盒攻击下能多大程度上保持有效？
RQ4在黑盒和灰盒设置下，尤其当攻击者无法访问防御机制时，该防御是否依然有效？
RQ5通过随机化是否可进一步提升防御鲁棒性，同时不降低干净准确率？

主要发现

在白盒FGSM攻击下，D3在ImageNet上的Top-1准确率达到34.4%，远超此前工作报告的0%，展现出显著提升。
在灰盒设置下，D3在ImageNet上的鲁棒准确率比当前最先进防御高出19.7%。
通过随机化，D3在BPDA攻击下的鲁棒准确率从13.0%提升至34.4%，表明对基于梯度的攻击具有更强的抗性。
更大的图像块尺寸（最高达32×32）通过缩小攻击者有效搜索空间提升了鲁棒性，尽管略微降低了重建质量。
在较简单任务（如50个ImageNet类别）下，D3保持了91.7%的高干净准确率，并在白盒攻击下实现了70.9%的鲁棒准确率。
在CIFAR-10上，D3在FGSM攻击下实现了87%的干净准确率和80%的鲁棒准确率，优于大多数现有防御，仅略逊于专为FGSM设计的防御。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。