QUICK REVIEW

[論文レビュー] Acme: A Research Framework for Distributed Reinforcement Learning

Matt Hoffman, Bobak Shahriari|arXiv (Cornell University)|Jun 1, 2020

Reinforcement Learning in Robotics参考文献 2被引用数 73

ひとこと要約

Acmeは、再利用可能なコンポーネント（actors、learners、replay）を備えたモジュール型フレームワークを提示し、分散RLにおける迅速なプロトタイピングと再現性を実現します。また、オンライン、オフライン、模倣学習、およびデモンストレーションからの学習設定に対する最先端アルゴリズムの参照実装も提供します。

ABSTRACT

Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation. This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme.

研究の動機と目的

現代のRLにおける増大する複雑さとスケールに対応するため、エージェント構築のためのモジュール化された再利用可能なコンポーネントを提供する。
主要なRLアルゴリズムの参照実装を通じて、迅速なプロトタイピングと再現性を実現する。
オンライン、オフライン、模倣学習、デモンストレーションからの学習を含む幅広い学習設定をサポートする。
コアロジックを再実装することなく、単純な単一プロセスから大規模な分散システムまでの展開を促進する。

提案手法

環境ループ、アクター、リプレイストレージ、ラーナー、ビルダーを含むモジュール化されたエージェントアーキテクチャを定義する。
構成可能なサンプリングと優先度付けを備えた高スループットのエクスペリエンスリプレイシステムとしてReverbを導入する。
データ生成を訓練から分離するための柔軟なアクターインターフェースとGenericActor/ActorCoreパターンを説明する。
ラーナーがアクターを更新し、RLDSを介してオフラインデータセットの使用を可能にする変数ソースを露出する方法を説明する。
エージェントを組み立て、ローカルおよび分散実験の両方を実行するためのビルダーを用いたアプローチを提示する。
適応可能なデータパイプラインとデータセットを通じたオフラインおよび模倣学習のサポートについて論じる。

実験結果

リサーチクエスチョン

RQ1RLエージェントを再利用可能でスケーラブルな構成要素に分解して、解釈性やデバッグのしやすさを失うことなく実現できるか？
RQ2オンライン、オフライン、模倣学習設定のいずれにおいても迅速な実験と再現性を促進するアーキテクチャ上の選択は何か？
RQ3分散RLシステムは、アクター、ラーナー、リプレイ間で安定したデータフローとトレーニング効率をどのように維持できるか？

主な発見

Acmeは、実装の可読性とモジュール性を保ちながら大規模な分散RLを実現します。
このフレームワークは、複数の最先端アルゴリズムのベースラインと参照実装を提供します。
実験は、多様な環境における分散エージェントのスケーラビリティを示しています。
モジュール化されたデータパイプラインとRLDSデータセット形式を通じて、オフラインおよび模倣学習のワークフローが統合されています。
ビルダーをベースとした設計により、さまざまなエージェントを構築し、最小限の再実装で実行できます。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。