[Paper Review] Portfolio Allocation for Bayesian Optimization
This paper proposes GP-Hedge, a portfolio-based Bayesian optimization method that adaptively selects among multiple acquisition functions using an online multi-armed bandit strategy. By dynamically allocating between exploration-exploitation strategies like Expected Improvement and Upper Confidence Bound, GP-Hedge achieves superior performance—outperforming any single acquisition function—while providing a theoretical regret bound that links to GP-UCB's convergence properties.
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive black-box optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the model's estimate of the objective and the uncertainty at any given point. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multi-armed bandit strategy. We propose several portfolio strategies, the best of which we call GP-Hedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm's performance.
Motivation & Objective
- To address the challenge of selecting the optimal acquisition function in Bayesian optimization, where no single function performs well across all objective functions.
- To improve optimization efficiency by combining multiple acquisition functions into a dynamic portfolio that adapts based on performance.
- To develop a theoretically grounded method that provides performance guarantees through regret bounds, even when individual acquisition functions vary in effectiveness.
- To evaluate whether adaptive hedging strategies can consistently outperform static acquisition functions in real-world and synthetic optimization tasks.
Proposed method
- The method employs a hierarchical hedging strategy, modeling the selection of acquisition functions as a multi-armed bandit problem with rewards based on observed function improvement.
- It maintains a portfolio of acquisition functions, including Expected Improvement (EI), Probability of Improvement (PI), and GP-UCB, and uses online learning to update their weights based on past performance.
- The core algorithm, GP-Hedge, uses a weighted combination of acquisition functions, where weights are updated via exponential weighting schemes to favor higher-performing strategies.
- The method incorporates a theoretical regret bound by relating cumulative regret to GP-UCB's known convergence properties, under assumptions on information gain and kernel hyperparameters.
- It uses Gaussian process priors with squared exponential kernels and automatic relevance determination (ARD) to model the objective function, with hyperparameters estimated from data.
- The algorithm is evaluated on standard benchmark functions and a real-world reinforcement learning task, using noisy function evaluations and sequential sampling.
Experimental results
Research questions
- RQ1Can a portfolio of acquisition functions, dynamically selected via online learning, outperform any individual acquisition function in Bayesian optimization?
- RQ2How does the performance of hedging strategies compare to individual acquisition functions across diverse objective functions with varying smoothness and structure?
- RQ3What theoretical guarantees can be provided for the cumulative regret of a portfolio-based Bayesian optimization method?
- RQ4Does the inclusion of GP-UCB in the portfolio improve convergence and robustness, especially in non-stationary or high-dimensional settings?
Key findings
- GP-Hedge outperforms all individual acquisition functions on standard benchmark functions, including those with plateaus and non-stationary behavior where PI struggles.
- The method achieves lower cumulative regret than any single acquisition function, with empirical results showing consistent gains across synthetic and real-world tasks.
- Full-information hedging strategies, which observe all acquisition function rewards, outperform partial-information variants in most cases, especially when acquisition functions provide conflicting signals.
- The theoretical regret bound shows that GP-Hedge’s performance is related to GP-UCB’s convergence, with sub-linear regret terms that suggest eventual convergence as the number of iterations increases.
- The method is robust to poor initial choices of acquisition functions, as the adaptive portfolio mechanism corrects for suboptimal selections over time.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.