QUICK REVIEW

[论文解读] Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness

L. Zhao, Ting Liu|arXiv (Cornell University)|Oct 15, 2020

Adversarial Robustness in Machine Learning参考文献 78被引用 64

一句话总结

介绍 ME-ADA：一种基于信息瓶颈的对抗数据增强正则化，在对抗阶段最大化预测熵，以生成更难的扰动并提高对领域漂移与损坏的鲁棒性。

ABSTRACT

Adversarial data augmentation has shown promise for training robust deep neural networks against unforeseen data shifts or corruptions. However, it is difficult to define heuristics to generate effective fictitious target distributions containing "hard" adversarial perturbations that are largely different from the source distribution. In this paper, we propose a novel and effective regularization term for adversarial data augmentation. We theoretically derive it from the information bottleneck principle, which results in a maximum-entropy formulation. Intuitively, this regularization term encourages perturbing the underlying source distribution to enlarge predictive uncertainty of the current model, so that the generated "hard" adversarial perturbations can improve the model robustness during training. Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin.

研究动机与目标

通过对抗数据增强来激发对未见领域漂移和损坏的鲁棒泛化。
嵌入信息理论正则化以产生更困难的目标分布，从而增大预测不确定性。
开发一个计算高效的最大化阶段，使用最大熵原理来增强数据。
为在分类任务中实现最大熵正则化提供理论支持和实用指南。

提出的方法

通过将互信息项 I(X;Z) 添加到交叉熵中，形成一个有监督的信息瓶颈（IB）目标。
通过最大熵正则化放松并近似 IB 目标，用 H(Ŷ) 来替代 I(X;Z)，以在最大化阶段实现可行优化。
通过迭代训练循环解决得到的极小极大问题，包含最大化阶段（数据增强）和最小化阶段（模型更新）。
从 softmax 输出经验估计 H(Ŷ)，并使用鲁棒代理损失 φγ 来传播对抗扰动。
通过扰动输入实现最大化步骤，使其最大化 L_CE + β H(Ŷ) − γ cθ，生成更难的对抗样本。
可选地扩展到随机网络（BNN），以更好地捕捉 ME-ADA 中的预测不确定性。

实验结果

研究问题

RQ1基于信息瓶颈的信息理论正则化能否提升对抗数据增强的有效性？
RQ2用可处理的熵代理 H(Ŷ) 替代 I(X;Z) 是否能保持或提升对领域漂移和损坏的鲁棒性？
RQ3ME-ADA 框架在多领域和多架构上是否有效（MNIST 漂移、PACS、CIFAR-10/100-C）？

主要发现

SVHN	MNIST-M	SYN	USPS	平均值
42.00	63.98	49.80	79.10	58.72
42.56	63.27	50.39	81.04	59.32

ME-ADA 在若干基准任务上对现有最先进基线取得统计显著的改进。
在 MNIST 的领域漂移任务中，ME-ADA（以及含 BNN 的 ME-ADA）优于 ERM、ADA 和 PAR，在目标域上获得最佳平均准确率。
在 PACS 上，ME-ADA 在不使用域识别的方法中表现最佳，接近使用域信息的方法。
在 CIFAR-10-C 和 CIFAR-100-C 上，ME-ADA 在多种架构下显著提升鲁棒性，常常比此前方法高出若干点。
经验熵基代理在非确定性或启用 dropout 的网络下仍保持有效，并有理论支撑。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。