QUICK REVIEW

[论文解读] Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Zhewei Yao, Amir Gholami|arXiv (Cornell University)|Feb 22, 2018

Adversarial Robustness in Machine Learning参考文献 22被引用 65

一句话总结

本文通过基于 Hessian 的分析研究大批量训练，结果显示大批量趋于进入高曲率区域，且对对抗扰动更易受攻击；稳健优化通过偏向平坦极小值来对抗这一问题。

ABSTRACT

Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss function changes when training with large batch size. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum. Furthermore, we show that robust training allows one to favor flat areas, as points with large Hessian spectrum show poor robustness to adversarial perturbation. We further study this relationship, and provide empirical and theoretical proof that the inner loop for robust training is a saddle-free optimization problem extit{almost everywhere}. We present detailed experiments with five different network architectures, including a residual network, tested on MNIST, CIFAR-10, and CIFAR-100 datasets. We have open sourced our method which can be accessed at [1].

研究动机与目标

使用真实 Hessian 谱来研究与小批量相比，大批量大小如何改变损失表面的形状。
检查大批量训练与对抗性扰动鲁棒性之间的关系。
探究稳健优化如何影响 Hessian 谱及决策边界。

提出的方法

在训练过程中通过反向传播二阶导数直接计算真实的 Hessian 谱。
比较小批量与大批量在 Hessian 谱和扰动景观上的差异。
在跨架构/数据集上使用 FGSM 和二阶攻击分析对抗性扰动。
在某些条件下证明内部稳健优化在几乎处处都是无鞍点的。
使用经验和理论分析将稳健训练与 Hessian 谱的变化联系起来。

实验结果

研究问题

RQ1相对于小批量训练，大批量训练如何改变损失表面的局部几何？
RQ2批大小与模型对抗扰动鲁棒性之间的关系是什么？
RQ3稳健优化是否将解偏向更平坦（曲率更低）的区域，这与对抗鲁棒性有何关系？
RQ4对抗性训练的内部循环几乎处处都是无鞍点的优化问题吗？

主要发现

大批量训练在训练和测试损失的区域收敛于具有明显更高 Hessian 谱的区域。
用大批量收敛的点对对抗性攻击比用小批量训练的点更易受到攻击。
稳健训练使模型更偏向具有较小 Hessian 谱的区域，表明偏向平坦极小值。
在所给假设下，内部对抗扰动问题几乎处处都是无鞍点的。
稳健优化提高对抗鲁棒性，但可能降低对干净数据的准确性。
对抗性训练改变 Hessian 谱，即使总损失的曲率仍为正，也可能产生曲率更低的模型。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。