QUICK REVIEW

[论文解读] Hyperparameter optimization with approximate gradient

Fabián Pedregosa|arXiv (Cornell University)|Feb 7, 2016

Sparse and Compressive Sensing Techniques参考文献 24被引用 139

一句话总结

tldr：提出 Hoag，一种基于梯度的超参数优化算法，使用可累加误差的近似梯度，确保收敛到一个驻点；在正则化和核参数上进行了验证。

ABSTRACT

Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we propose an algorithm for the optimization of continuous hyperparameters using inexact gradient information. An advantage of this method is that hyperparameters can be updated before model parameters have fully converged. We also give sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors. Finally, we validate the empirical performance of this method on the estimation of regularization constants of L2-regularized logistic regression and kernel Ridge regression. Empirical benchmarks indicate that our approach is highly competitive with respect to state of the art methods.

研究动机与目标

激发对正则化和基于核的模型中高效超参数优化需求。
开发一种基于梯度的方法，使用近似梯度以降低计算负担。
在温和的正则性和可求和假设下建立收敛性保证。
在多个数据集上对 Hoag 进行逻辑回归和核岭回归的经验评估。

提出的方法

将超参数优化表述为带有内层和外层目标的双层问题。
使用内问题的近似解和一个线性系统推导外部目标的近似梯度。
定义 Hoag：将内问题求解到容忍度 εk，解 Hessian-线性系统至 εk，从近似梯度构造 p_k，使用步长 1/L 执行投影梯度步。
证明收敛性：梯度误差为 O(εk)，且 εk 可和性意味着收敛到一个驻点。
讨论自适应步长和实际实现细节，包括基于共轭梯度的 Hessian 求解和 εk-容忍策略。

实验结果

研究问题

RQ1如何利用近似梯度信息来进行超参数优化？
RQ2在双层超参数设置中，近似梯度方法在什么条件下收敛到驻点？
RQ3哪些实际的容忍序列和步长策略能带来具有竞争力的经验性能？
RQ4在准确性和效率方面，Hoag 与网格搜索、随机搜索、SMBO 和迭代微分相比如何？
RQ5Hoag 是否能在跨数据集的正则化参数估计和核参数调优中得到有效应用？

主要发现

外部目标的梯度可以用可控误差近似，误差为 O(εk)。
若 εk 序列可和，Hoag 收敛到外部目标的一个驻点。
在 L2 正则化逻辑回归和核岭回归上，Hoag 相对于网格搜索、随机搜索、SMBO 和迭代微分表现具有竞争力。
在实践中可以使用自适应步长策略来应对未知的 Lipschitz 常数，同时保持收敛性。
Hoag 受益于对内层优化的暖启动，从而提高效率。
经验结果表明，在某些数据集上尽管总体进展非单调，但早期收敛很快。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。