QUICK REVIEW

[论文解读] Acme: A Research Framework for Distributed Reinforcement Learning

Matt Hoffman, Bobak Shahriari|arXiv (Cornell University)|Jun 1, 2020

Reinforcement Learning in Robotics参考文献 2被引用 73

一句话总结

Acme 提供一个模块化框架，用于构建和扩展强化学习代理，具可复用组件（演员、学习者、重放）以实现分布式RL的快速原型设计和可复现性。它还为在线、离线、模仿学习和从示范学习等设定提供先进算法的参考实现。

ABSTRACT

Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation. This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme.

研究动机与目标

通过提供可重复使用的模块化组件来提高现代 RL 的复杂性和规模的应对能力，以便构建代理。
通过关键 RL 算法的参考实现实现快速原型设计和可复现性。
支持在线、离线、模仿学习和从示范学习等多种学习设定。
在从简单的一体进程到大规模分布式系统的部署中实现无需重复实现核心逻辑。

提出的方法

定义由环境循环、代理、重放存储、学习者和构建器组成的模块化代理架构。
引入 Reverb 作为具可配置采样与优先级的高吞吐量经验回放系统。
描述灵活的 actor 接口以及 GenericActor/ActorCore 模式，以将数据生成与训练分离。
解释学习者如何暴露可变数据源以更新代理，并通过 RLDS 支持离线数据集的使用。
提出基于构建器的代理组合方法，并运行本地与分布式实验。
讨论通过可适配的数据管道和数据集来支持离线和模仿学习。

实验结果

研究问题

RQ1如何在不损失可解释性或调试便利性的前提下，将 RL 代理分解为可重复使用、可扩展的组件？
RQ2哪些体系结构选择在在线、离线和模仿学习环境中促进快速实验与可复现性？
RQ3分布式 RL 系统如何在代理、学习者和回放之间保持稳定的数据流和训练效率？

主要发现

Acme 在保持实现可读性和模块化的同时实现了大规模分布式 RL。
该框架为若干前沿算法提供基线和参考实现。
实验表明分布式代理在多样化环境中的可扩展性。
离线和模仿学习工作流通过模块化数据管道与 RLDS 数据集格式实现集成。
基于构建器的设计支持构建多种代理并以最少的重新实现运行它们。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。