[论文解读] On Empirical Comparisons of Optimizers for Deep Learning
本文表明超参数调优协议会影响优化器的排序,并且优化器之间的包含关系(例如自适应方法与动量)能够可靠地预测比较性能。它认为经过良好调优的自适应方法永不劣于动量或SGD。
Selecting an optimizer is a central step in the contemporary deep learning pipeline. In this paper, we demonstrate the sensitivity of optimizer comparisons to the hyperparameter tuning protocol. Our findings suggest that the hyperparameter search space may be the single most important factor explaining the rankings obtained by recent empirical comparisons in the literature. In fact, we show that these results can be contradicted when hyperparameter search spaces are changed. As tuning effort grows without bound, more general optimizers should never underperform the ones they can approximate (i.e., Adam should never perform worse than momentum), but recent attempts to compare optimizers either assume these inclusion relationships are not practically relevant or restrict the hyperparameters in ways that break the inclusions. In our experiments, we find that inclusion relationships between optimizers matter in practice and always predict optimizer comparisons. In particular, we find that the popular adaptive gradient methods never underperform momentum or gradient descent. We also report practical tips around tuning often ignored hyperparameters of adaptive gradient methods and raise concerns about fairly benchmarking optimizers for neural network training.
研究动机与目标
- 评估超参数调优协议如何影响深度学习中的优化器排序。
- 在现实的调优预算下,探究优化器之间的包含关系是否仍成立。
- 识别影响对优化器进行公平基准测试的实用超参数调优考虑因素。
提出的方法
- 在不同的超参数搜索空间下对优化器进行实验比较。
- 评估优化器之间的包含关系(自适应方法、动量、SGD)。
- 分析调优工作量如何影响优化器的相对表现。
实验结果
研究问题
- RQ1超参数搜索空间是否决定了深度学习中优化器的相对表现?
- RQ2在现实的调优预算下,优化器之间的包含关系是否成立?
- RQ3经过良好调优的自适应方法会不会劣于动量或SGD,还是它们总是接近它们?
- RQ4在公平基准测试中需要哪些实用的超参数调优指南?
主要发现
- 超参数调优协议对优化器排序具有关键影响。
- 改变超参数搜索空间可能颠倒比较研究的结论。
- 在允许调优时,自适应梯度方法并不劣于动量或SGD。
- 优化器之间的包含关系能可靠地预测实际比较。
- 本文为调优自适应梯度方法提供了实用建议,并对基准测试实践给出警示。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。