Skip to main content
QUICK REVIEW

[Paper Review] Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Lisha Li, Kevin Jamieson|arXiv (Cornell University)|Mar 21, 2016
Machine Learning and Data Classification1,061 citations
TL;DR

Hyperband introduces a pure-exploration, bandit-based hyperparameter optimization method that adaptively allocates resources (e.g., iterations, data, features) to configurations and uses multiple brackets of successive halving to speed up hyperparameter search, often outperforming Bayesian optimization.

ABSTRACT

Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Hyperband, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems.

Motivation & Objective

  • Motivate the challenge of hyperparameter optimization for complex ML models where performance hinges on tuning multiple parameters.
  • Propose a fast, principled method to allocate computational resources adaptively across configurations.
  • Provide theoretical guarantees for a pure-exploration, infinite-armed bandit formulation.
  • Empirically compare Hyperband to Bayesian optimization methods across varied tasks and resources.

Proposed method

  • Formulate hyperparameter optimization as a pure-exploration, non-stochastic, infinite-armed bandit problem.
  • Introduce Hyperband, which combines multiple brackets of Successive Halving to trade off exploration (many config) and exploitation (more resources per config).
  • Use a finite-budget outer loop over brackets with parameter n (configurations) and r (resource per config), where each bracket runs Successive Halving.
  • Define two inputs, R (max resource per configuration) and eta (discard factor), and derive s_max and total budget B.
  • Provide an infinite-horizon variant that doubles budget over time to handle unknown R.
  • Demonstrate that Hyperband adapts to unknown convergence rates and envelope behaviors of validation losses without strong parametric assumptions.
  • Show that Hyperband can be combined with any hyperparameter sampling strategy and is agnostic to stochasticity in evaluations.

Experimental results

Research questions

  • RQ1How can hyperparameter optimization be framed as a pure-exploration, infinite-armed bandit problem?
  • RQ2Can a multi-bracket, successive-halving approach efficiently identify good hyperparameters under a fixed budget?
  • RQ3How does Hyperband perform relative to Bayesian optimization methods across different resource types and tasks?
  • RQ4What theoretical guarantees can be established for Hyperband in both finite and infinite horizon settings?

Key findings

  • Hyperband achieves significant speedups (over an order of magnitude in some cases) over Bayesian optimization methods on deep-learning and kernel-based tasks.
  • The algorithm hedges between aggressive exploration and conservative evaluation by using multiple brackets with different n and r tradeoffs.
  • An infinite-horizon variant and the pure-exploration framing provide theoretical insights and near-optimal budget usage relative to ideal successive halving under unknown envelope behavior.
  • Empirical results across iterations, data subsampling, and feature subsampling demonstrate robustness and wide applicability.
  • Hyperband requires only R and eta, and can be paired with any hyperparameter sampling approach.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.