QUICK REVIEW

[论文解读] Combination of Hyperband and Bayesian Optimization for Hyperparameter Optimization in Deep Learning

Jiazhuo Wang, Jason Xu|arXiv (Cornell University)|Jan 5, 2018

Machine Learning and Data Classification参考文献 5被引用 57

一句话总结

该论文提出将 Hyperband 与贝叶斯优化相结合，以利用超参数搜索中的历史信息，相较仅使用 Hyperband、贝叶斯优化或随机搜索，能够获得更好且更快的深度学习超参数配置。

ABSTRACT

Deep learning has achieved impressive results on many problems. However, it requires high degree of expertise or a lot of experience to tune well the hyperparameters, and such manual tuning process is likely to be biased. Moreover, it is not practical to try out as many different hyperparameter configurations in deep learning as in other machine learning scenarios, because evaluating each single hyperparameter configuration in deep learning would mean training a deep neural network, which usually takes quite long time. Hyperband algorithm achieves state-of-the-art performance on various hyperparameter optimization problems in the field of deep learning. However, Hyperband algorithm does not utilize history information of previous explored hyperparameter configurations, thus the solution found is suboptimal. We propose to combine Hyperband algorithm with Bayesian optimization (which does not ignore history when sampling next trial configuration). Experimental results show that our combination approach is superior to other hyperparameter optimization approaches including Hyperband algorithm.

研究动机与目标

由于深度学习的高度复杂性和训练成本，激发系统性超参数调优的需求。
引入一种利用历史信息来引导超参数采样并高效分配资源的方法。
证明所提出的组合方法在多种深度学习任务上优于现有的超参数优化方法。

提出的方法

回顾 Hyperband 和贝叶斯优化及其优点和缺点。
提出一个结合 Hyperband 的算法，但按贝叶斯优化准则逐步采样试验点。
使用贝叶斯代理模型（TPE）来指导下一个试验点的选择并用中间结果更新它。
在每个 Hyperband 轮次中，逐个采样试验点，在每次评估后更新代理模型以平衡开发利用与探索。
在 LeNet 和 AlexNet 实验以及 SSD 分解任务上评估该方法，以与 Random search、TPE 和 Hyperband 进行比较。

实验结果

研究问题

RQ1是否可以通过贝叶斯优化结合先前试验的历史信息来改进 Hyperband？
RQ2结合的 Hyperband+贝叶斯优化方法是否在跨数据集和模型复杂度下比基线方法更快地找到更好的超参数配置？
RQ3当超参数问题难度增加（例如更深的网络、较大的参数空间）时，该方法的表现如何？

主要发现

Hyperband_TPE 在多个深度学习任务中持续优于 Random search、TPE 和 Hyperband。
随着超参数优化问题变得更困难，Hyperband_TPE 与基线之间的性能差距扩大。
在较简单的问题上，所有方法收敛都很快，但在更困难的问题上，组合方法显示出更明显的优势。
在 SSD 分解实验中，Hyperband_TPE 再次给出比基线更好的目标值（map 和 fps）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。