QUICK REVIEW

[論文レビュー] Benchmarking Reinforcement Learning Algorithms on Real-World Robots

A. Rupam Mahmood, Dmytro Korenkevych|arXiv (Cornell University)|Sep 20, 2018

Reinforcement Learning in Robotics被引用数 46

ひとこと要約

本論文は、3つのロボットにまたがる6つの実世界ロボット強化学習タスクを提示し、4つの連続制御RLアルゴリズム（TRPO、PPO、DDPG、Soft-Q）をベンチマークしてハイパーパラメータ感度とクロスタスク転送を研究する。ハイパーパラメータは性能に大きく影響することを示しており、良い設定はベースラインとして一般化しうる一方で、いくつかのアルゴリズムは特定のタスクで性能が劣る。

ABSTRACT

Through many recent successes in simulation, model-free reinforcement learning has emerged as a promising approach to solving continuous control robotic tasks. The research community is now able to reproduce, analyze and build quickly on these results due to open source implementations of learning algorithms and simulated benchmark tasks. To carry forward these successes to real-world applications, it is crucial to withhold utilizing the unique advantages of simulations that do not transfer to the real world and experiment directly with physical robots. However, reinforcement learning research with physical robots faces substantial resistance due to the lack of benchmark tasks and supporting source code. In this work, we introduce several reinforcement learning tasks with multiple commercially available robots that present varying levels of learning difficulty, setup, and repeatability. On these tasks, we test the learning performance of off-the-shelf implementations of four reinforcement learning algorithms and analyze sensitivity to their hyper-parameters to determine their readiness for applications in various real-world tasks. Our results show that with a careful setup of the task interface and computations, some of these implementations can be readily applicable to physical robots. We find that state-of-the-art learning algorithms are highly sensitive to their hyper-parameters and their relative ordering does not transfer across tasks, indicating the necessity of re-tuning them for each task for best performance. On the other hand, the best hyper-parameter configuration from one task may often result in effective learning on held-out tasks even with different robots, providing a reasonable default. We make the benchmark tasks publicly available to enhance reproducibility in real-world reinforcement learning.

研究の動機と目的

再現性のある実世界のRL研究を可能にするため、物理的ロボット向けのベンチマークタスクを導入する。
多様な実世界のロボットタスクに対して、複数のオフ・ザ・シェルフのRLアルゴリズムを評価する。
学習性能のハイパーパラメータ感度とタスク間の一貫性を分析する。

提案手法

商用利用可能な3つのロボット（UR5、Dynamixel MX-64AT、Create 2）を用いて6つのRLタスクを定義する。
遅延を減らすため、環境とエージェントを別々のプロセスで実装し、リアルタイムRLを実現する。
オープンソース実装を用いて、4つの連続制御アルゴリズム（TRPO、PPO、DDPG、Soft-Q-learning）を評価する。
UR-Reacher-2とDXL-Reacherタスクを横断して感度を評価するために、ランダムなハイパーパラメータ探索を実施する。
UR-Reacher-2で最も性能の良かった設定を、保持タスク以外のタスクでテストして一般化を評価する。
再現性、初期化の影響を分析し、スクリプト化されたベースラインと比較する。

実験結果

リサーチクエスチョン

RQ1最先端のRLアルゴリズムは、さまざまな制御インターフェースとセンサ modalities を備えた6つの実世界ロボットタスクでどのように性能を発揮するか？
RQ2異なるタスク間で、RLの性能はハイパーパラメータの選択にどれくらい敏感か？
RQ3ハイパーパラメータ設定は、保持していないタスクやロボットへ妥当なデフォルトとして転用できるか？
RQ4シミュレーションと異なり、実機ロボットで学習する際の実務的な課題と再現性の考慮事項は何か？

主な発見

ハイパーパラメータの選択は、タスクを超えて方針の品質に大きな影響を与える。
TRPOはハイパーパラメータの変動に対して比較的感度が低く、最終性能も競争力を維持する。
Soft-QはUR5およびDXLのいくつかのタスクで最も速く学習できるが、過激な探索により過熱に直面することがある。
DDPGは本研究ではUR5およびDXLタスクで性能が低い。
いくつかのハイパーパラメータ設定は、保持していないタスクや異なるロボットに対して妥当なベースラインとして一般化する。
RLソリューションはときにスクリプト化されたベースラインに及ばないが、Create-Dockerのように明らかなスクリプト戦略がないタスクでは競合的になることがある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。