Skip to main content
QUICK REVIEW

[论文解读] Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness

Andrey Malinin, Mark Gales|arXiv (Cornell University)|May 31, 2019
Adversarial Robustness in Machine Learning参考文献 37被引用 69
一句话总结

简要结论:本文提出用反向 KL 散度训练 Prior Networks,以更好地建模不确定性并提升错配分布外检测,同时实现一种广义对抗训练方法,增加对自适应白盒攻击的鲁棒性。

ABSTRACT

Ensemble approaches for uncertainty estimation have recently been applied to the tasks of misclassification detection, out-of-distribution input detection and adversarial attack detection. Prior Networks have been proposed as an approach to efficiently \emph{emulate} an ensemble of models for classification by parameterising a Dirichlet prior distribution over output distributions. These models have been shown to outperform alternative ensemble approaches, such as Monte-Carlo Dropout, on the task of out-of-distribution input detection. However, scaling Prior Networks to complex datasets with many classes is difficult using the training criteria originally proposed. This paper makes two contributions. First, we show that the appropriate training criterion for Prior Networks is the \emph{reverse} KL-divergence between Dirichlet distributions. This addresses issues in the nature of the training data target distributions, enabling prior networks to be successfully trained on classification tasks with arbitrarily many classes, as well as improving out-of-distribution detection performance. Second, taking advantage of this new training criterion, this paper investigates using Prior Networks to detect adversarial attacks and proposes a generalized form of adversarial training. It is shown that the construction of successful \emph{adaptive} whitebox attacks, which affect the prediction and evade detection, against Prior Networks trained on CIFAR-10 and CIFAR-100 using the proposed approach requires a greater amount of computational effort than against networks defended using standard adversarial training or MC-dropout.

研究动机与目标

  • 在神经网络中激发对不确定性的认识,并在错误分类、OOD 输入和对抗威胁下需要对置信度保持可靠性
  • 引入 Prior Networks 作为对输出分布的 Dirichlet 先验模型,以高效地模拟集成
  • 展示反向 KL 散度是 Prior Networks 的合适训练准则,使其对多类问题具有可扩展性并提升 OOD 检测
  • 探讨使用反向 KL 标准的广义对抗训练框架,使对抗攻击更难针对 Prior Networks 构造

提出的方法

  • 定义对输出分布参数化 Dirichlet 分布的 Prior Networks
  • 对比 Dirichlet 目标的前向 KL 与反向 KL 训练准则,论证 RKL 在高数据不确定性区域产生单一的高精度模式
  • 推导并比较损失形式:前向 KL(原始 PN) vs. 反向 KL(提出的 PN-RKL),在期望中显式混合(几何混合 vs. 算术混合)
  • 在不使用辅助损失的情况下对图像数据集训练 PN-RKL,并在同领域准确度与 OOD 检测上进行评估
  • 将框架扩展到对抗攻击检测,通过构造以 RKL 形塑对抗输入的不确定性的广义对抗训练损失
  • 在使用定向 PGD-MIM 的自适应白盒攻击下评估鲁棒性,并与 DNN、对抗训练 DNN 和 MC-dropout 基线进行比较

实验结果

研究问题

  • RQ1反向 KL 散度是否为跨数据集(具有不同类别数量)时对 Prior Networks 提供正确的训练信号?
  • RQ2PN-RKL 在提升 OOD 检测的同时,是否能够维持与 PN-KL 相当的分类性能?
  • RQ3PN-RKL 是否能够有效检测对抗攻击并提高对自适应白盒攻击的鲁棒性?
  • RQ4所提出的对抗训练形式如何影响对 Prior Networks 的成功攻击空间?
  • RQ5在复杂数据集上,不同 OOD 训练数据选择对 PN-RKL 的局限性有何影响?

主要发现

  • PN-RKL 产生的 不确定性 指标与数据集结构相一致:在重叠区域存在高数据不确定性,在 OOD 输入处存在高知识不确定性
  • 在合成的高不确定性数据上,PN-RKL 能比 PN-KL 更准确地区分总不确定性、数据不确定性与知识不确定性
  • PN-RKL 的分类错误率与标准 DNN 和集成模型相当,而随着数据集复杂性增加,PN-KL 会下降
  • 在使用适当的 OOD 数据时,PN-RKL 的 OOD 检测 AUROC 超过 PN-KL,并在 CIFAR-10/CIFAR-100 上与或超越集成模型
  • 在 PN-RKL 下进行的对抗训练(beta_in 和 beta_adv 设置)使自适应白盒攻击比在标准 DNN、DNN-ADV 或 MC-dropout 防御下更难成功
  • 该方法降低了自适应攻击的可转移性,黑盒攻击对 PN-RKL 往往无效

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。