QUICK REVIEW

[论文解读] Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates

Ilija Ilievski, Taimoor Akhtar|arXiv (Cornell University)|Jul 28, 2016

Machine Learning and Data Classification被引用 73

一句话总结

该论文提出HORD，一种基于径向基函数（RBF）代理模型的确定性超参数优化方法，可高效搜索深度学习的超参数空间。通过结合动态坐标搜索与基于RBF的代理建模，HORD相较于贝叶斯优化方法（如GP-EI）将函数评估次数减少了高达6倍，尤其在高维设置下表现更优。

ABSTRACT

Automatically searching for optimal hyperparameter configurations is of crucial importance for applying deep learning algorithms in practice. Recently, Bayesian optimization has been proposed for optimizing hyperparameters of various machine learning algorithms. Those methods adopt probabilistic surrogate models like Gaussian processes to approximate and minimize the validation error function of hyperparameter values. However, probabilistic surrogates require accurate estimates of sufficient statistics (e.g., covariance) of the error distribution and thus need many function evaluations with a sizeable number of hyperparameters. This makes them inefficient for optimizing hyperparameters of deep learning algorithms, which are highly expensive to evaluate. In this work, we propose a new deterministic and efficient hyperparameter optimization method that employs radial basis functions as error surrogates. The proposed mixed integer algorithm, called HORD, searches the surrogate for the most promising hyperparameter values through dynamic coordinate search and requires many fewer function evaluations. HORD does well in low dimensions but it is exceptionally better in higher dimensions. Extensive evaluations on MNIST and CIFAR-10 for four deep neural networks demonstrate HORD significantly outperforms the well-established Bayesian optimization methods such as GP, SMAC, and TPE. For instance, on average, HORD is more than 6 times faster than GP-EI in obtaining the best configuration of 19 hyperparameters.

研究动机与目标

解决高维深度学习超参数优化中高斯过程等概率代理方法效率低下的问题。
减少寻找近优超参数配置所需的昂贵函数评估次数。
开发一种能够有效处理连续与离散超参数的确定性混合整数优化算法。
在贝叶斯方法因计算开销过大而表现不佳的高维超参数空间中，提升可扩展性与性能。
证明基于RBF的代理模型结合动态坐标搜索，在性能上优于当前最先进的贝叶斯与树基优化方法。

提出的方法

HORD采用确定性的径向基函数（RBF）代理模型，将验证误差建模为超参数的函数，避免了高斯过程所需的协方差估计。
该算法使用动态坐标搜索，迭代更新候选超参数点，聚焦于搜索空间中更具前景的区域。
候选点通过在当前最优解周围施加正态分布扰动生成，每次仅对部分维度进行扰动，以提升效率。
下一评估点的选择由代理预测值与已评估点之间距离的加权组合引导，倾向于在当前最优解附近进行探索。
HORD支持连续与整数值超参数，可有效实现深度神经网络配置的混合整数优化。
一种变体HORD-ISP引入初始猜测，进一步加速后期迭代的收敛速度。

实验结果

研究问题

RQ1在深度学习超参数优化中，确定性RBF代理模型是否能优于高斯过程等概率代理模型？
RQ2结合目标性扰动的动态坐标搜索在高维超参数空间中如何提升收敛速度？
RQ3与当前最先进的贝叶斯及树基优化算法相比，所提方法在减少昂贵函数评估次数方面能提升多少？
RQ4与现有方法相比，HORD在超参数维度增加时性能是否仍能保持有利的可扩展性？
RQ5RBF代理模型与智能候选生成策略的结合，是否能在保持或提升最终验证误差的同时实现更快收敛？

主要发现

在MNIST和CIFAR-10基准上，HORD在19个超参数的配置搜索中，相较GP-EI实现了6倍的加速。
在所有测试的问题维度中，HORD平均比其他方法快3.7至6倍，且在高维设置下表现持续更优。
HORD在收敛速度与最终验证误差方面均优于GP-EI、GP-PES、SMAC与TPE，尤其在高维超参数空间中优势显著。
该算法在解决方案质量与计算效率方面均表现出统计显著的提升，尤其当超参数数量超过10个时更为明显。
采用初始猜测的HORD-ISP变体进一步提升了收敛速度，表明有指导性的初始化可增强性能。
与基于GP的方法相比，RBF代理模型显著降低了计算开销，因为后者需进行代价高昂的协方差矩阵计算，且其计算复杂度随维度增加而急剧恶化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。