QUICK REVIEW

[Paper Review] Bayesian Optimization in AlphaGo

Yutian Chen, Aja Huang|arXiv (Cornell University)|Dec 17, 2018

Embedded Systems Design Techniques4 references78 citations

TL;DR

The paper documents how Bayesian optimization was routinely used to automatically tune AlphaGo’s game-playing hyper-parameters, yielding significant strength gains and insights across multiple development tasks.

ABSTRACT

During the development of AlphaGo, its many hyper-parameters were tuned with Bayesian optimization multiple times. This automatic tuning process resulted in substantial improvements in playing strength. For example, prior to the match with Lee Sedol, we tuned the latest AlphaGo agent and this improved its win-rate from 50% to 66.5% in self-play games. This tuned version was deployed in the final match. Of course, since we tuned AlphaGo many times during its development cycle, the compounded contribution was even higher than this percentage. It is our hope that this brief case study will be of interest to Go fans, and also provide Bayesian optimization practitioners with some insights and inspiration.

Motivation & Objective

Demonstrate how Bayesian optimization was used to tune AlphaGo’s game-playing hyper-parameters during development.
Quantify the impact of automated tuning on playing strength and provide insights into parameter contributions.
Describe the optimization methodology, including models, acquisition functions, and practical considerations in noisy, expensive evaluations.
Showcase task-based tuning results and the resulting performance gains across iterations and hardware changes.

Proposed method

Use Gaussian process priors over the unknown win-rate function p(theta) with input warping.
Optimize using the Expected Improvement acquisition function to balance exploration and exploitation.
Model observation noise from Bernoulli self-play outcomes with a nonstationary Gaussian noise model and estimate it from 50-game evaluations.
Handle non-differentiability and high evaluation cost to justify Bayesian optimization over grid search.
Implement a modified Spearmint framework and a sequential tuning procedure that updates the GP with new self-play outcomes.
Provide visualization tools to assess win-rate sensitivity to individual or pairs of hyper-parameters and identify influential factors.

Experimental results

Research questions

RQ1Can Bayesian optimization reliably identify hyper-parameter settings that improve AlphaGo’s win-rate in self-play evaluations?
RQ2What is the relative contribution of different hyper-parameter groups (e.g., MCTS, roll-outs vs. value network, time control) to overall playing strength?
RQ3How do automatic hyper-parameter settings compare to hand-tuned baselines across iterations and hardware (GPUs vs TPUs)?
RQ4Can dynamic components (like mixing ratio and time control) be effectively discovered and improved via Bayesian optimization?

Key findings

Bayesian optimization increased AlphaGo’s self-play win-rate from 50% to 66.5% in the final tuning before the Lee Sedol match.
Across design iterations, MCTS hyper-parameters improved win-rate by 63.2% and 64.4%, corresponding to 94 and 103 Elo gains, respectively.
Automated tuning yielded improvements beyond single-parameter adjustments and revealed correlations between parameters that were hard to discover by hand.
Tuning yielded insights such as shifting preference toward value-network based estimates over roll-outs in later designs, influencing subsequent versions (e.g., Go Zero).
Tuning fast data-generation players with short search times produced substantial Elo gains (examples: 300, 285, 145, 129 across four key versions).
Dynamic mixing-ratio formulas and time-control strategies discovered by optimization showed move-number dependencies (e.g., peak around move 150) and significant performance gains (e.g., 66.5% win-rate with a new time control).

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.