QUICK REVIEW

[論文レビュー] Embodied Lifelong Learning for Task and Motion Planning

Jorge A. Mendez, Leslie Pack Kaelbling|arXiv (Cornell University)|Jul 13, 2023

Robot Manipulation and Learning被引用数 8

ひとこと要約

この論文は embodied TAMP のための lifelong sampler 学習を形式化し、専門モデルと一般モデルをオンラインで選択する階層的拡散サンプラーの混合を提案して、生涯のタスクにわたる計画を改善します。

ABSTRACT

A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge and proficiency. We formalize this setting with a novel formulation of lifelong learning for task and motion planning (TAMP), which endows our learner with the compositionality of TAMP systems. Exploiting the modularity of TAMP, we develop a mixture of generative models that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across various models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements (over time and compared to baselines) in planning success on 2D and BEHAVIOR domains.

研究の動機と目的

タスクと運動計画（TAMP）を真の lifelong 設定で formalize する。
計画の成功に影響を与える連続パラメータを学習する生成サンプラーを TAMP のモジュラリティを利用して学習する。
補助タスクを介して一般的なサンプラーと専門サンプラーをオンラインで選択する生成モデルの混合を開発する。
2D および BEHAVIOR ドメインで時間をかけた計画性能の改善を示す。

提案手法

抽象アクションのサンプラーを、状態を条件とした連続パラメータを生成する拡散モデルとして表現する。
一般サンプラーとオブジェクトタイプごとの専門サンプラーを組み合わせる階層的モデルアプローチを用い、データが不足している場合にデータをプールできる。
補助信号 z を介してサンプラーの信頼性を評価する補助予測子を訓練し、信頼性に基づいてサンプルを重み付けする混合分布を形成する。
セSeAmE（search-then-sample）二段階計画フレームワークを採用し、スケルトンを離散レベルで生成し、連続パラメータのサンプリングで refinement する。
忘却を防ぐため、古いデータと新しいデータを簡易に混ぜるリプレイ／再訓練戦略を用いて lifelong データでサンプラーを訓練する。

Figure 1: The learning robot will face a sequence of diverse TAMP problems in a true lifelong setting. It will use its current models to solve each problem as efficiently as possible, and then use any collected data to improve those models for the future. Images captured from BEHAVIOR [ 1 ] .

実験結果

リサーチクエスチョン

RQ1拡散ベースのサンプラーは lifelong 設定で有用な連続パラメータ分布を学習できるか。
RQ2専門サンプラーと一般サンプラーの混合はデータが限られている場合に計画効率を改善するか。

主な発見

データから学習された拡散モデルサンプラーは、観測された成功と有効アクション分布と整合する分布を生み出す。
階層的サンプラーの混合は、特にデータ不足の領域でベースラインを上回る。
lifelong 評価は、混合アプローチがベースラインおよび一様サンプリングに比べ、累積解決問題数を実質的に改善することを示す。
BEHAVIOR ドメインでは lifelong 学習者が手作りの開始サンプラーを上回り、現実的なドメインで継続的な改善を示す。

Embodied Lifelong Learning for Task and Motion Planning

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。