Skip to main content
QUICK REVIEW

[Paper Review] Efficient Parallel Methods for Deep Reinforcement Learning

Alfredo Vicente Clemente, Humberto Nicolás Castejón|arXiv (Cornell University)|May 13, 2017
Reinforcement Learning in Robotics5 references80 citations
TL;DR

PAAC introduces a GPU-friendly, synchronous, multi-actor parallel framework that learns on-policy from hundreds of actors on a single machine, achieving state-of-the-art Atari results in hours. It compares favorably to Gorila, A3C, and GA3C across multiple games.

ABSTRACT

We propose a novel framework for efficient parallelization of deep reinforcement learning algorithms, enabling these algorithms to learn from multiple actors on a single machine. The framework is algorithm agnostic and can be applied to on-policy, off-policy, value based and policy gradient based algorithms. Given its inherent parallelism, the framework can be efficiently implemented on a GPU, allowing the usage of powerful models while significantly reducing training time. We demonstrate the effectiveness of our framework by implementing an advantage actor-critic algorithm on a GPU, using on-policy experiences and employing synchronous updates. Our algorithm achieves state-of-the-art performance on the Atari domain after only a few hours of training. Our framework thus opens the door for much faster experimentation on demanding problem domains. Our implementation is open-source and is made public at https://github.com/alfredvc/paac

Motivation & Objective

  • Motivate and enable efficient parallelization of deep reinforcement learning on a single machine.
  • Develop an algorithm-agnostic framework that can handle on-policy, off-policy, value-based, and policy-gradient methods.
  • Demonstrate that synchronous updates with many actors can achieve fast learning and strong performance.
  • Provide an open-source implementation to accelerate experimentation in demanding domains.

Proposed method

  • Propose a general parallel framework with n_e environments and n_w workers to collect experiences and batch-update a single set of neural network parameters.
  • Use synchronous, batched updates to avoid stale-gradient issues common in asynchronous methods.
  • Showcase with Parallel Advantage Actor-Critic (PAAC), an n-step A2C-style algorithm with policy and value networks sharing parameters.
  • In PAAC, compute gradients for policy and value using mini-batches of size n_e * t_max and update weights synchronously.
  • Experiment with two network architectures to compare model-size effects (arch_nips and arch_nature) and train on Atari 2600 using TensorFlow on a GPU.

Experimental results

Research questions

  • RQ1Can a single-machine, highly parallel framework support on-policy, off-policy, value-based, and policy-gradient RL algorithms efficiently?
  • RQ2Does synchronous multi-actor training on GPUs provide state-of-the-art performance on Atari with significantly reduced training time compared to prior parallel approaches?
  • RQ3How do different network architectures and actor counts affect learning speed and stability in a parallel RL setting?
  • RQ4What are the trade-offs between environment interaction time and learning time when scaling the number of parallel actors?

Key findings

  • PAAC achieves state-of-the-art performance on the Atari 2600 domain after only a few hours of training on a single machine.
  • PAAC outperforms Gorila in 8 of 12 games and outperforms A3C FF in 8 games in the reported results.
  • PAAC matches GA3C in most tested games and surpasses it in several, as shown in Table 1.
  • Increasing the number of environments n_e accelerates training time (faster progress to a given timestep) while maintaining competitive scores, with some divergence observed at very high n_e when learning rate scaling is insufficient.
  • The framework enables true on-policy learning with a single parameter copy and synchronous updates, reducing issues associated with stale gradients and asynchrony.
  • Experiments demonstrate the framework’s ability to train with two architectures (arch_nips and arch_nature) and on a GPU, achieving substantial speedups (hours instead of days) for Atari.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.