QUICK REVIEW

[论文解读] Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Bernd Bischl, Martin Binder|arXiv (Cornell University)|Jul 13, 2021

Machine Learning and Data Classification被引用 49

一句话总结

对超参数优化（HPO）的全面综述，概述基础概念、主要算法（网格/随机搜索、进化策略、贝叶斯优化）、实用指南与未解决的挑战。

ABSTRACT

Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for supervised machine learning, can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods such as grid or random search, evolutionary algorithms, Bayesian optimization, Hyperband and racing. It gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with ML pipelines, runtime improvements, and parallelization. This work is accompanied by an appendix that contains information on specific software packages in R and Python, as well as information and recommended hyperparameter search spaces for specific learning algorithms. We also provide notebooks that demonstrate concepts from this work as supplementary files.

研究动机与目标

说明需要自动化的 HPO 以取代手动、耗时的调优。
给出监督学习中 HPO 的正式、通用框架。
整理主要的 HPO 方法及其优缺点。
提供关于评估、搜索空间和并行化在 HPO 中的实用建议。
讨论 HPO 的未解决问题和未来方向。

提出的方法

将 HPO 问题定义为通过重采样的泛化估计来优化随机、黑箱目标函数 c(λ)。
表征搜索空间 Λ 以及潜在的分层/条件超参数。
回顾公认的 HPO 算法（网格/随机搜索、进化策略、贝叶斯优化）。
解释重采样策略（保留数据/ holdout、交叉验证）及其对泛化估计的影响。
描述 HPO 如何与 ML 流水线及预处理步骤集成。
就如何选择重采样方法、定义搜索空间以及对 HPO 进行并行化提供实用指南。

实验结果

研究问题

RQ1有监督学习中超参数优化的核心理论与实践基础是什么？
RQ2主要的 HPO 算法在探索/利用、噪声处理和可扩展性方面的比较如何？
RQ3在真实机器学习工作流中，哪些实用指南能提高 HPO 的有效性和可重复性？
RQ4搜索空间应如何构建，包括分层和条件超参数，以实现稳健优化？
RQ5HPO 仍存在哪些尚待解决的挑战以指导未来研究？

主要发现

HPO 可以被框架为在有界、可能是分层搜索空间上进行的黑箱、随机优化问题。
基于重采样的泛化误差估计使 HPCs 的评估成为可能，但引入偏差，可通过嵌套交叉验证来缓解。
网格搜索和随机搜索是简单的基线；在高维情形下，随机搜索通常优于网格搜索。
贝叶斯优化，特别是使用代理模型和获取函数时，能高效地在昂贵评估中平衡探索与利用。
进化策略对噪声具有鲁棒性，适用于复杂搜索空间，但可能需要大量评估以进行 HPO。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。