QUICK REVIEW

[論文レビュー] CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Ossama Ahmed, Frederik Träuble|arXiv (Cornell University)|Oct 8, 2020

Reinforcement Learning in Robotics参考文献 42被引用数 31

ひとこと要約

CausalWorldは、環境変数への介入を可能にするパラメータ化されたロボット操作ベンチマークを導入し、RLにおける因果構造学習と転移を研究する。カリキュラムとTriFingerプラットフォームによるシムツーリアル転送を提供する。

ABSTRACT

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

研究の動機と目的

制御可能な因果環境を通じて、RLにおける out-of-distribution 一般化の研究を動機づけ、促進する。
因果構造を共有する大規模でパラメータ化可能なロボット操作タスクのスイートを提供する。
環境パラメータへの介入を許可し、さまざまな一般化の軸とカリキュラムを研究する。
タスク間で学習アルゴリズムを比較するための統一的な成功指標と評価プロトコルを提供する。

提案手法

ブロックを用いてゴール構造を形成する3D形状から構成されるパラメータ化されたタスク群を定義する。
質量、色、形状、重力など、広範な因果変数を公開し、それらに対するdo介入を許可する。
TriFingerロボット用に、構造化された低次元観測とピクセルベース観測の複数の観測モードと、さまざまな行動空間をサポートする。
カリキュラムとout-of-distribution評価を可能にする訓練空間と評価空間（ATSとES）を導入する。
多様なゴールのためのタスク生成器（例：Pushing、Picking、Pick and Place、Stacking2、Towers など）を提供する。
異なるカリキュラムと評価プロトコルの下で、ベースラインのモデルフリーRL手法（PPO、SAC、TD3）をベンチマークする。）

実験結果

リサーチクエスチョン

RQ1トレーニング中に環境の因果変数を変更することは、見たことのないタスクへの転移にどのように影響するか。
RQ2統一された成功指標とカリキュラム根幹の介入は、ロボット操作における同一分布内と異分布一般化を分離できるか。
RQ3さまざまなカリキュラム下で、複雑で複数物体のゴール形状に対する現在のモデルフリーRL手法の限界は何か。
RQ4現実のTriFingerプラットフォームへポリシーを転送する際のシムツーリアルの考慮事項は、学習にどのような影響を及ぼすか。

主な発見

十分な訓練の下で、モデルフリーRL手法は単一ブロックの簡単なタスクを解くが、複数ブロックの積み重ねタスクには苦戦する。
ゴール形状や環境パラメータをランダム化するカリキュラムは一般化性能に大きく影響し、極端なランダム化は学習を妨げる。
ゴール形状のランダム化の下で新しい初期姿勢へのある程度の一般化が見られるが、極端なドメインランダム化は学習を阻害することがある。
CausalWorldのような統一的でパラメータ化されたベンチマークは、質量、摩擦、色などの軸に沿った同一分布内外一般化を明示的に評価できる。
ベースラインの結果はタスクの実現可能性を確認し、複雑で複数物体の操作には帰納的バイアスや構造化手法の必要性を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。