QUICK REVIEW

[論文レビュー] GenSim: Generating Robotic Simulation Tasks via Large Language Models

Wang Li-rui, Yiyang Ling|arXiv (Cornell University)|Oct 2, 2023

Topic Modeling被引用数 8

ひとこと要約

GenSim は GPT-4 および他の LLM を用いて多様なロボットシミュレーションタスクとデモを自動生成し、マルチタスクポリシー訓練を可能にしてタスクレベルの一般化とシム→現実転送を改善します。

ABSTRACT

Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data. However, existing methods for data generation have generally focused on scene-level diversity (e.g., object instances and poses) rather than task-level diversity, due to the human effort required to come up with and verify novel tasks. This has made it challenging for policies trained on simulation data to demonstrate significant task-level generalization. In this paper, we propose to automatically generate rich simulation environments and expert demonstrations by exploiting a large language models' (LLM) grounding and coding ability. Our approach, dubbed GenSim, has two modes: goal-directed generation, wherein a target task is given to the LLM and the LLM proposes a task curriculum to solve the target task, and exploratory generation, wherein the LLM bootstraps from previous tasks and iteratively proposes novel tasks that would be helpful in solving more complex tasks. We use GPT4 to expand the existing benchmark by ten times to over 100 tasks, on which we conduct supervised finetuning and evaluate several LLMs including finetuned GPTs and Code Llama on code generation for robotic simulation tasks. Furthermore, we observe that LLMs-generated simulation programs can enhance task-level generalization significantly when used for multitask policy training. We further find that with minimal sim-to-real adaptation, the multitask policies pretrained on GPT4-generated simulation tasks exhibit stronger transfer to unseen long-horizon tasks in the real world and outperform baselines by 25%. See the project website (https://liruiw.github.io/gensim) for code, demos, and videos.

研究の動機と目的

LLM を活用して多様で達成可能なシミュレーションタスクを作成する労力を削減する動機づけ。
タスクの多様性を拡張するための2モードのタスク生成パイプライン（ゴール指向と探索的）を開発する。
検証と微調整のために高品質タスクをキャッシュ・再利用するタスクライブラリを作成する。
LLM 生成タスク上で言語条件付きマルチタスクポリシーを訓練し、タスクレベルの一般化を改善する。
ロボットシミュレーション文脈におけるコード生成のための GPT-4、GPT-3.5、Code Llama などの LLM を評価し、シム→現実転送を分析する。

提案手法

シーン生成とデモ生成のためのタスク記述と対応コードを出力する2モードのタスククリエータを提案する。
タスクライブラリを用いて retrieval-augmented 生成を可能にし、将来の使用のために検証済みタスクを保存する。
ライタの LLM クリティックによる反省と検証ループを適用して、ライブラリへ追加する前にタスク品質を評価する。
タスクライブラリのタスクから生成されたデモンストレーションを用いて言語条件付きマルチタスクポリシーを訓練する。
タスク生成品質とポリシーの一般化をシミュレーションと現実世界転送で評価する。

実験結果

リサーチクエスチョン

RQ1LLM は多様で高品質なロボットシミュレーションタスクとデモを設計・実装できるか。
RQ2LLM 生成タスクでの訓練は、人間が選定したタスクだけの場合よりポリシーのタスクレベルの一般化を改善するか。
RQ3多様な LLM 生成シミュレーションでの事前学習は長期的なタスクのシム→現実転送を高めるか。
RQ4ゴール指向生成と探索的生成のモードは、ポリシー学習の有用なタスクカリキュラムの作出においてどのように比較されるか。

主な発見

特に GPT-4 は、既存タスクからブートストラップすることで高品質・達成可能・多様なシミュレーションタスクを生成できる。
GenSim のタスクデータでオープンソース LLM をファインチューニングすると生成性能が向上し、GPT-4 ベースのタスクはドメイン内およびゼロショット一般化を高める。
GPT-4 生成タスクで訓練されたマルチタスクポリシーは、ドメイン内一般化が50%以上向上し、シミュレーションでのゼロショット転送が顕著に改善される。
最小限のシム→現実適応で、GPT-4 タスクで事前訓練したポリシーは未見の現実世界タスクへより良く転移し、ベースラインを約25%上回る。
より大規模な生成タスク集合（例: 70 タスク）で事前訓練すると、現実世界の長期的タスク（例: ビルド-wheel）での頑健性が顕著になる。
シミュレータ内のタスク多様性は、適応後の現実世界適応性を約25%向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。