[论文解读] Practical Bayesian Optimization of Machine Learning Algorithms
引入了使用高斯过程先验的完全贝叶斯贝斯优化用于超参数,具成本感知和并行采集,在多样化的机器学习问题上实现了接近专家水平甚至更好的调优。
Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.
研究动机与目标
- 自动调整机器学习算法的超参数、正则化项和优化设置。
- 用高斯过程先验建模泛化性能,以引导高效实验。
- 将实际约束如不同的试验时长和并行评估等整合到优化循环中。
提出的方法
- 使用高斯过程先验来建模未知的超参数目标函数 f(x)。
- 采用例如期望改进(EI)的获取函数,并与GP-UCB进行比较。
- 通过蒙特罗方法对GP超参数进行边际化,从而实现对GP超参数的完全贝叶斯处理(带MCMC的EI)。
- 通过将持续时间 c(x) 视为GP来进行成本建模,并优化每秒的EI。
- 通过对待评估结果的可能结果进行蒙特卡洛平均来实现并行实验的获取函数。
实验结果
研究问题
- RQ1超参数上的完全贝叶斯GP先验如何影响贝叶斯优化的性能?
- RQ2成本感知(每秒EI)和并行性是否能在实践中提升超参数调优的效率?
- RQ3在优化成功中选择不同协方差函数(例如Matérn 5/2 与平方指数)有哪些影响?
- RQ4在待评估项整合获取中,如何影响下一点的选择?
- RQ5这些方法在真实的机器学习问题上是否超越人类专家?
主要发现
- 对GP超参数进行整合(GP EI MCMC)在基准测试中优于仅使用点估计的超参数策略。
- 每秒EI通过优先选择评估更快的配置来加速实际耗时效率。
- 并行化的GP EI MCMC(N x GP EI MCMC)在大规模问题上能比网格搜索更快找到更好的参数。
- 不同的协方差选择实质性影响优化成功;Matérn 5/2通常比平方指数产生更真实的函数样本。
- 在CIFAR-10上,GP EI MCMC方法实现了14.98%的验证误差,相对于专家设定。
- 在各种任务(LDA、结构化SVM、CNN)中,所提出的贝叶斯优化方法通常超越人类专家表现和先前的自动方法。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。