QUICK REVIEW

[论文解读] MaxUp: A Simple Way to Improve Generalization of Neural Network Training

Chengyue Gong, Tongzheng Ren|arXiv (Cornell University)|Feb 20, 2020

Adversarial Robustness in Machine Learning参考文献 42被引用 35

一句话总结

MaxUp 在增强数据上最小化最大损失以诱导梯度范数正则化，在最小开销下提升在视觉、语言和认证任务上的泛化能力。

ABSTRACT

We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, \emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which \emph{MaxUp} consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5\%$ without extra data to $85.8\%$. Code will be released soon.

研究动机与目标

Motivate overfitting and generalization gaps in neural network training.
Propose MaxUp to enforce robustness against random data perturbations.
Show that MaxUp acts as a gradient-norm regularization under Gaussian perturbations.
Demonstrate improvements across image classification, language modeling, and adversarial certification.

提出的方法

生成每个数据点来自扰动分布 P(·|x) 的 m 个增强副本。
最小化这 m 个增强副本中的最坏情况损失：min_theta E_x~D[ max_{i in [m]} L(x_i', theta) ]。
仅对每个数据点的最坏增强副本进行反向传播，给出简单的 SGD 更新（梯度等于最坏副本的梯度）。
通过 Taylor 展开将 MaxUp 解释为引入梯度范数正则化项 ||∇_x L(x, theta)||_2，系数 c_{m,σ} = Θ(σ sqrt(log m))。
表示在各向同性高斯扰动 P(·|x)=N(x, σ^2 I) 下，期望的 MaxUp 风险近似于 L(x, theta) + c_{m,σ}||∇_x L(x, theta)||_2 + O(σ^2)。
解释 MaxUp 如何与现有数据增强互补，以及它与轻量级对抗训练和在线困难样本挖掘的关系。

实验结果

研究问题

RQ1Does maximizing the loss over augmented data improve generalization beyond standard data augmentation?
RQ2How does MaxUp relate to gradient-norm regularization under perturbations such as Gaussian noise?
RQ3Can MaxUp improve performance across diverse tasks (vision, language modeling, certified robustness) and architectures without substantial computational overhead?
RQ4How does the choice of m and the augmentation distribution P(·|x) affect performance across datasets?
RQ5How does MaxUp interact with existing adversarial training regimes and certification methods?

主要发现

MaxUp improves generalization across image classification, language modeling, and adversarial certification tasks.
On ImageNet, MaxUp with CutMix raises top-1 accuracy from 85.5% (state-of-the-art with extra data not used) to 85.8%.
On CIFAR-10 with Cutout, MaxUp improves accuracy from 95.41% to 95.52% (averaged over runs) for certain architectures.
On CIFAR-100, MaxUp with Cutout improves accuracy from 75.26% to 82.48% (WideResNet-28-10, table shows 82.48% with m=10).
In language modeling, MaxUp applied to AWD-LSTM yields lower perplexities on PTB and WT2 than prior state-of-the-art baselines.
For adversarial certification, MaxUp with Gaussian perturbations (MaxUp+Gauss) outperforms Cohen et al. (2019) and PGD-based training across examined radii, with faster, easier hyperparameter tuning.
MaxUp provides a lightweight alternative to PGD adversarial training, with minimal overhead and broad compatibility with augmentation schemes.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。