QUICK REVIEW

[論文レビュー] Code as Policies: Language Model Programs for Embodied Control

Jacky Liang, Wenlong Huang|arXiv (Cornell University)|Sep 16, 2022

Robot Manipulation and Learning被引用数 35

ひとこと要約

本論文は、コード作成型のLLMが自然言語命令からロボットのポリシーコードを生成し、知覚-行動ループを制御できることを示しており、階層的なコード生成を通じて、追加の訓練なしに複数のロボットに対してリアクティブおよびウェイポイントベースのポリシーを実現します。

ABSTRACT

Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given natural language commands. Specifically, policy code can express functions or feedback loops that process perception outputs (e.g.,from object detectors [2], [3]) and parameterize control primitive APIs. When provided as input several example language commands (formatted as comments) followed by corresponding policy code (via few-shot prompting), LLMs can take in new commands and autonomously re-compose API calls to generate new policy code respectively. By chaining classic logic structures and referencing third-party libraries (e.g., NumPy, Shapely) to perform arithmetic, LLMs used in this way can write robot policies that (i) exhibit spatial-geometric reasoning, (ii) generalize to new instructions, and (iii) prescribe precise values (e.g., velocities) to ambiguous descriptions ("faster") depending on context (i.e., behavioral commonsense). This paper presents code as policies: a robot-centric formulation of language model generated programs (LMPs) that can represent reactive policies (e.g., impedance controllers), as well as waypoint-based policies (vision-based pick and place, trajectory-based control), demonstrated across multiple real robot platforms. Central to our approach is prompting hierarchical code-gen (recursively defining undefined functions), which can write more complex code and also improves state-of-the-art to solve 39.8% of problems on the HumanEval [1] benchmark. Code and videos are available at https://code-as-policies.github.io

研究の動機と目的

現実世界の知覚-行動ループに言語の grounding を動機づけ、データ収集と訓練の必要性を削減する。
自然言語指示から実行可能なロボットポリシーを生成できる、コード作成型LLMを実証する。
複雑なポリシーを組み立て、一般化を改善するための階層的なコード生成を提案する。
反応制御と視覚ベースの操作を含む複数のロボットとタスクにわたる CaP の実演。
ロボティクスに特化したコード生成ベンチマークを紹介し、スケーリング効果を分析する。

提案手法

コメントとして提供された自然言語命令から Python に似たポリシーコードを生成するために OpenAI Codex を使用する。
知覚出力を処理しアクチュエータを制御するポリシーコードを生成するよう、命令-to-コードのマッピングの例（ファew-shot）でLLMsを促す。
未定義関数を再帰的に定義して、より大きく再利用可能なポリシーモジュールを構築する階層的なコード生成を採用。
生成された LMP を、制限されたグローバル/ローカルスコープ内で Python の exec を介して安全に実行し、実 Robots で実行する。
LMPs を知覚と制御の API（例: オープン語彙検出器、NumPy、Shapely、PD/インピーダンス様のプリミティブ）に grounding。
ロボティクスのベンチマーク（RoboCodeGen、HumanEval）と実機で、卓上およびモバイル操作タスクを評価。

実験結果

リサーチクエスチョン

RQ1コード作成型のLLMsは、知覚と制御パラメータを考慮して実行可能なロボットポリシーへ自然言語指示を翻訳できるか。
RQ2階層的なコード生成は、ロボットポリシーの品質と一般化、および標準のコード生成ベンチマークを改善するか。
RQ3CaP は、言語ベースのプランナーや従来の模倣学習のベースラインと比較して、ロボティクスのタスクでどのように性能を発揮するか。
RQ4知覚 grounded オープン語彙検出器と制御プリミティブが、LMP に柔軟なタスク grounding をどの程度可能にするか。
RQ5CaP のドメインとモデルサイズを横断した制約とスケーリング挙動は何か。

主な発見

階層的コード生成はポリシーの品質とコード生成ベンチマークを改善し、より大きな Codex モデルで HumanEval において 39.8% の P@1 を達成。
CaP はオープン語彙の知覚とプログラム可能な制御プリミティブを用いて、複数のロボットプラットフォームでリアクティブおよびウェイポイントベースのポリシーを実現。
CaP は未知の指示や物体に一般化でき、追加訓練なしで新しいタスクにポリシーコードを適応できる。
ロボティクスベンチマークでは、CaP は見られる属性のシナリオで一部の監視付きベースラインと同等またはそれを超える性能を示し、未知の属性やタスクへの頑健な一般化を示す。
より大きなモデルと階層化プロンプティングは、コード生成とロボティクスのタスクの性能向上と相関する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。