QUICK REVIEW

[论文解读] Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

Aaron Klein, Stefan Falkner|arXiv (Cornell University)|May 23, 2016

Machine Learning and Data Classification被引用 317

一句话总结

本文提出了 Fabolas，一种贝叶斯优化方法，将损失和成本建模为数据集大小的函数，从而通过评估更便宜的子样本并外推到完整数据来实现大数据集上的超参数优化。

ABSTRACT

Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed Fabolas, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that Fabolas often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.

研究动机与目标

在完整评估成本高昂或不切实际的大型数据集上，激励进行超参数优化。
提出一种将子抽样数据集作为环境变量以加速搜索的原理性方法。
开发一个贝叶斯优化框架：从较小子集外推到完整数据集的性能。
在以达到完整数据表现为目标的同时，实现信息增益与计算成本之间的自动权衡。

提出的方法

通过带有定制核的高斯过程，将损失和成本建模为超参数与数据集大小的函数。
在数据集大小的核中扩展一个有限秩分量，以实现从 s∈[0,1] 的外推到 s=1。
使用 Entropy Search 作为采集函数，在单位时间内最大化关于全数据最优解的信息增益。
将开销时间纳入采集过程，以反映总时耗而不仅仅是评估成本。
以偏向廉价评估的设计进行初始化，以学习数据集大小的缩放关系。
提供一个开源实现（RoBO）以便可重复性。

实验结果

研究问题

RQ1子样本评估是否能可靠地推断全数据集上的超参数性能？
RQ2应如何将损失和计算成本建模为数据集大小的函数，以实现对全数据的外推？
RQ3Fabolas 是否在发现大数据集的高质量超参数方面优于标准贝叶斯优化、MTBO 和 Hyperband？
RQ4在选择 (x, s) 的决策规则中包含评估开销的影响如何？

主要发现

Fabolas 通常比其他贝叶斯优化方法或 Hyperband 快 10 到 100 倍地找到高质量的超参数。
在 SVM 和深度神经网络任务中，Fabolas 实现了显著的时钟时间加速，同时达到或超过全数据基线。
使用连续的数据集大小变量在许多情况下无需在完整大小下进行评估即可学习相关性。
相较于 MTBO、Hyperband 和标准 BO，Fabolas 在多个数据集上更快收敛到表现良好的候选解。
该方法在卷积神经网络和残差网络中仍然有效，尽管加速幅度会随模型和数据的缩放性质而变化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。