QUICK REVIEW

[论文解读] On Detecting Adversarial Perturbations

Jan Hendrik Metzen, Tim Genewein|arXiv (Cornell University)|Feb 14, 2017

Adversarial Robustness in Machine Learning被引用 219

一句话总结

本文在分类器中添加一个小型检测子网，用以区分真实数据与对抗样本，显示在 CIFAR-10 和一个 10 类 ImageNet 子集上的强检测性，并包含对动态对手的防御。

ABSTRACT

Machine learning and deep learning in particular has advanced tremendously on perceptual tasks in recent years. However, it remains vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to a human. In this work, we propose to augment deep neural networks with a small "detector" subnetwork which is trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. Our method is orthogonal to prior work on addressing adversarial perturbations, which has mostly focused on making the classification network itself more robust. We show empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans. Moreover, while the detectors have been trained to detect only a specific adversary, they generalize to similar and weaker adversaries. In addition, we propose an adversarial attack that fools both the classifier and the detector and a novel training procedure for the detector that counteracts this attack.

研究动机与目标

激发研究并解决深度网络对近乎不可感知的对抗扰动的脆弱性。
提出一个二分类检测子网，用于区分原始数据和经对抗扰动的数据。
展示检测器在与训练时对手相似或更弱的对手上的泛化能力。
研究动态对手并提出用于增强检测器抵御它们的训练策略。

提出的方法

在中间层将一个小型对手检测子网附加到预训练分类器。
在为训练集生成的原始数据与对抗样本的平衡数据集上训练检测器。
固定分类器权重并以对抗样本标签的交叉熵损失来训练检测器。
通过在 CIFAR-10 和 ImageNet 子集上的实验来探查检测器的放置位置和架构。
引入一种动态对手形式，在扰动生成过程中同时优化分类器和检测器的目标。
开发动态对手训练，以增强检测器对适应性攻击的鲁棒性。

实验结果

研究问题

RQ1是否可以通过在特定对手上训练的检测器可靠检测到数据相关的对抗扰动？
RQ2检测器在分类器中的放置位置如何影响对抗样本的可检测性？
RQ3在一个对手上训练的检测器是否会迁移到其他对手或范数（例如 l_inf 与 l2）？
RQ4检测器对同时适应分类器和检测器的动态对手有多鲁棒？
RQ5哪种训练过程可以增强检测器对自适应、动态攻击的鲁棒性？

主要发现

检测器在 CIFAR-10 上对所测试的对手实现了高可检测性（超过 80%），当分类器在对抗样本上的准确率低于 10% 时，可检测性进一步提高到 (>90%)。
在中间网络层放置检测器（AD(2)）通常在对抗速度快/迭代性较强的情况下获得最佳检测；对于 DeepFool 变体，AD(4) 通常更优。
在一个对手上训练的检测器可以迁移到其他相似/更弱的对手；在相关攻击中，l_inf 与 l2 变体之间的迁移通常是有效的。
动态检测器（训练以对抗自适应攻击）在一系列适应强度（sigma 值）下保持可检测性 >70%。
在一个 10 类 ImageNet 子集上，检测器对大多数对手达到 85% 以上的可检测性；一个迭代的 l2 情况（epsilon=400）接近随机猜测，显示出一个具有挑战性的边界案例。
检测器可以启用回退或安全干预（如人工验证）在对抗样本输入被检测到时。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。