Skip to main content
QUICK REVIEW

[Paper Review] ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

Yuandong Tian, Jerry Ma|arXiv (Cornell University)|Feb 12, 2019
Artificial Intelligence in Games20 references42 citations
TL;DR

ELF OpenGo is an open-source reimplementation of AlphaZero for Go that achieves superhuman performance and provides extensive training analyses, datasets, and ablation studies to aid research.

ABSTRACT

The AlphaGo, AlphaGo Zero, and AlphaZero series of algorithms are remarkable demonstrations of deep reinforcement learning's capabilities, achieving superhuman performance in the complex game of Go with progressively increasing autonomy. However, many obstacles remain in the understanding of and usability of these promising approaches by the research community. Toward elucidating unresolved mysteries and facilitating future research, we propose ELF OpenGo, an open-source reimplementation of the AlphaZero algorithm. ELF OpenGo is the first open-source Go AI to convincingly demonstrate superhuman performance with a perfect (20:0) record against global top professionals. We apply ELF OpenGo to conduct extensive ablation studies, and to identify and analyze numerous interesting phenomena in both the model training and in the gameplay inference procedures. Our code, models, selfplay datasets, and auxiliary data are publicly available at https://ai.facebook.com/tools/elf-opengo/.

Motivation & Objective

  • Provide an open-source reimplementation of AlphaZero-style Go AI suitable for commodity hardware.
  • Train a superhuman ELF OpenGo model and release pretrained models, selfplay data, and auxiliary evaluation data.
  • Analyze training dynamics, ablations, and practical considerations to illuminate factors influencing large-scale deep RL for Go.

Proposed method

  • Reimplement AlphaZero-style Go training with MCTS guided by a neural policy and value network.
  • Train a 256-filter, 20-block residual network via self-play on commodity GPUs over 1.5 million minibatches (~3B game states).
  • Use a fixed replay buffer and SGD optimization with MCTS-based selfplay data to learn policy and value targets.
  • Conduct extensive ablations on PUCT constant, virtual loss, rollout counts, and training dynamics.
  • Validate strength via human matches and AI-vs-AI benchmarks, comparing against prototype models and LeelaZero.

Experimental results

Research questions

  • RQ1What is the strength and behavior of an open-source AlphaZero-style Go agent under commodity hardware?
  • RQ2How do key hyperparameters (PUCT, virtual loss) and rollout counts impact training efficiency and final strength?
  • RQ3What training dynamics (ladder moves, endgame vs opening learning) characterize ELF OpenGo’s learning process?
  • RQ4How does ELF OpenGo compare to human players and existing open-source AIs in strength and behavior?

Key findings

  • The final model achieves superhuman performance with a 20:0 record against top professionals in direct evaluation against humans.
  • Training used 2,000 self-play GPUs and 8 training GPUs over about 16 days, yielding a 20-block model with ~3B game states and ~20 million self-play games.
  • Prototype model evaluated against 4 top-30 professionals achieved 20:0 in 20 games, while ELF OpenGo also achieved a 980:18 win rate against LeelaZero (approx. 700 Elo).
  • Doubling MCTS rollouts yields ~200 Elo improvement when playing as White and ~35-200 Elo when playing as Black, indicating asymmetric benefit.
  • Ladder (lookahead) moves are learned slowly and not fully mastered, highlighting inductive biases in convolutional networks for Go.
  • There is substantial training variance; reducing the learning rate did not necessarily improve performance and could reduce diversity in self-play data.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.