QUICK REVIEW

[论文解读] SmoothOut: Smoothing Out Sharp Minima for Generalization in Large-Batch Deep Learning

Wei Wen, Yandan Wang|arXiv (Cornell University)|May 21, 2018

Stochastic Gradient Optimization Techniques被引用 4

一句话总结

SmoothOut 通过在参数空间中扰动多个 DNN 的副本并对其进行平均，解决了大规模小批量深度学习中的泛化差距问题。它引入了一种计算开销极小的随机变体，证明了该方法为无偏近似，并在不增加训练轮次的情况下实现了大规模小批量训练的最先进泛化性能。

ABSTRACT

In distributed deep learning, a large batch size in Stochastic Gradient Descent is required to fully exploit the computing power in distributed systems. However, generalization gap (accuracy loss) was observed because large-batch training converges to sharp minima which have bad generalization [1][2]. This contradiction hinders the scalability of distributed deep learning. We propose SmoothOut to smooth out sharp minima in Deep Neural Networks (DNNs) and thereby close generalization gap. SmoothOut perturbs multiple copies of the DNN in the parameter space and averages these copies. We prove that SmoothOut can eliminate sharp minima. Perturbing and training multiple DNN copies is inefficient, we propose a stochastic version of SmoothOut which only introduces overhead of noise injection and denoising per iteration. We prove that the Stochastic SmoothOut is an unbiased approximation of the original SmoothOut. In experiments on a variety of DNNs and datasets, SmoothOut consistently closes generalization gap in large-batch training within the same epochs. Moreover, SmoothOut can guide small-batch training to flatter minima and improve generalization. Our source code is in this https URL

研究动机与目标

解决大规模小批量随机梯度下降中出现的泛化差距问题，即模型收敛于泛化性能较差的尖锐极小值。
通过提出一种计算开销低的随机变体，克服训练多个 DNN 副本进行平滑处理的低效问题。
提供一种理论基础坚实的优化方法，能够消除尖锐极小值，同时保持训练效率。
使大规模小批量和小规模小批量训练均能收敛到更平坦的极小值，从而提升泛化性能。

提出的方法

SmoothOut 使用随机噪声在参数空间中对多个 DNN 副本进行扰动，以探索损失曲面的结构。
通过平均这些扰动后的 DNN 副本的输出，构建一个平滑的损失曲面，从而抑制尖锐极小值。
SmoothOut 的随机变体在每次训练迭代中注入噪声并执行去噪操作，显著降低计算成本。
该方法被证明是对原始 SmoothOut 的无偏近似，保留了理论保证。
该方法直接作用于模型参数，因此与标准深度学习框架完全兼容。
噪声注入发生在前向传播过程中，平均操作则在多个扰动模型的前向传播结果之间执行。

实验结果

研究问题

RQ1在参数空间中进行扰动和平均是否能有效消除深度神经网络中的尖锐极小值？
RQ2SmoothOut 的随机变体是否在降低计算成本的同时保持了原始方法的理论特性？
RQ3SmoothOut 是否能在不增加训练时间的前提下弥合大规模小批量训练中的泛化差距？
RQ4SmoothOut 是否能通过引导优化过程趋向更平坦的极小值，从而改善小批量训练的泛化性能？
RQ5与现有大规模小批量训练技术相比，SmoothOut 在测试准确率和收敛稳定性方面表现如何？

主要发现

SmoothOut 在多种 DNN 架构和数据集上，均能一致地弥合大规模小批量训练中的泛化差距。
SmoothOut 的随机变体在显著降低计算开销的同时，实现了与完整版本相当的性能。
SmoothOut 使大规模小批量训练在相同训练轮次内达到或超越小批量训练的泛化性能。
该方法通过引导优化过程趋向更平坦的极小值，改善了小批量训练的泛化性能。
SmoothOut 被证明是对原始方法的无偏近似，确保了理论一致性。
该方法在多种数据集和 DNN 模型上均表现有效，展现出广泛的适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。