QUICK REVIEW

[論文レビュー] RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Konstantinos Bousmalis, Giulia Vezzani|arXiv (Cornell University)|Jun 20, 2023

Multimodal Machine Learning Applications被引用数 9

ひとこと要約

RoboCat は自己改善トレーニングループを用いて新しいロボットタスクと embodiment に適応する、マルチエンボディメント・マルチタスクの視覚ベース汎用エージェントであり、視覚的ゴール条件づけを持つトランスフォーマーモデルを用いる。

ABSTRACT

The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

研究の動機と目的

heterogeneous robot experiences を活用した汎用的なロボット操作エージェントの必要性を動機付ける。
diverse embodiments と tasks を扱える visual goal-conditioned transformer を RoboCat として提案する。
100–1000 demonstrations で新しいタスクにファインチューニングし、自律データ収集による自己改善を実証する。
より広く多様なデータでクロストタスク転移と効率改善を示し、継続学習のためのデータ自己生成能力を示す。

提案手法

モデルアーキテクチャ: 大規模自己回帰トランスフォーマー（デコーダーのみ、約1.18B パラメータ）を用い、視覚観測をトークン化する凍結VQ-GAN画像エンコーダを採用。
タスクの仕様は視覚ゴールによって定義される；ゴールは軌跡内で繰り返され、リラベル用に hindsight ゴールを用いることができる。
訓練目的はアクショントークン予測と将来の画像トークン予測を組み合わせ、VQ-GAN エンコーダーからの画像トークンを活用。
データ統合: RoboCat は複数の embodiment、タスク、オブジェクトセットに跨る多様なデータセットで学習し、タスク変動をトークン列としてエンコード。
ファインチューニングと自己改善: 新しいタスクに対して 100–1000 demonstrations でファインチューニングし、自律的なオンポリシーデータ収集を展開、 hindsight ゴールでリラベルし、次の汎用エージェント反復を再訓練。
実世界展開: 成功検出報酬モデルと policy pools による自律リセットで、タスク間のデータ収集を拡張可能にする。

Figure 1 : The self-improvement process. RoboCat is a multi-task, multi-embodiment visual goal-conditioned agent that can iteratively self-improve. A diverse training set is used to train an initial version of this generalist agent, which can be fine-tuned to new tasks with 100–1000 demonstrations a

実験結果

リサーチクエスチョン

RQ1RoboCat は異種のマルチエンボディメントデータから学習して、多様な巧妙な操作タスクを解決できるか。
RQ2見たことのないタスク・物体・エンボディメントへ、少数のデモでどれだけ適応できるか。
RQ3RoboCat はクロス・タスク転移を示し、より広く多様な訓練データで改善しますか。
RQ4RoboCat は自律的にデータを収集し、それを次の訓練反復へ統合して自己改善できるか。

主な発見

単一の RoboCat エージェントで、複数の embodiment およびオブジェクトセットにまたがる253の訓練・微調整タスク変種を実行できる。
RoboCat は unseen タスクへ 100–1000 demonstrations でファインチューニングし、初期訓練時に見ていない14-DoFのKUKA embodiment へ適応できる。
自律データ収集とリラベルによる自己改善は、汎用能力とファインチューニング効率の反復的な向上を生み出す。
非ロボティクスデータで訓練された視覚ファウンデーションモデルのベースラインと比較して、RoboCat は訓練タスクで高い性能を達成し、ファインチューニングタスクでの適応が優れている。
多様なロボティクスデータをスケールすると、訓練タスクの性能が向上し新タスクへの適応速度が速くなる。

Figure 2 : RoboCat supports multiple robotic embodiments and control modes. These are all the different embodiments RoboCat is tested on, and the dimensionality of the action it needs to output for each. All robot arms have a Robotiq parallel gripper attached to them, with the exception of the KUKA

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。