QUICK REVIEW

[論文レビュー] Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

Yoonwoo Kim, Raghav Arora|arXiv (Cornell University)|Mar 4, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

The paper presents CoCo-TAMP, a PO-TAMP framework that uses LLMs to provide commonsense priors and co-location cues, improving belief estimates and planning efficiency in long-horizon tasks.

ABSTRACT

Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be found together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 62.7% in planning and execution time in simulation, and 72.6% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.

研究の動機と目的

CoCo-TAMP を導入する。 PO-TAMP の階層的状態推定フレームワークで、LLMs を用いて部屋・表面・物体姿勢に対する信念を形成する。
LLMs から二つの形式の常識知識を取り込む：物体の頻出場所と、物体同士の類似性に基づく co-location の手掛かり。
計画と実行中の信念更新を行う visibility-aware observation model を備えた階層ベイズフィルタを開発する。
大規模シミュレーションと実ロボット実験で、計画および実行時間の substantial な削減を実証する。

提案手法

複数選択肢の質問を通じて物体の部屋・表面場所の priors を生成するように LLM にクエリを投げる。
LLM の埋め込みを用いて物体間の類似性を捉える co-location モデルを構築する。
部屋・表面・姿勢の階層ベイズフィルタと visibility-aware observation model で信念を維持する。
検出アクションのコストが信念質量の逆数に比例するように設計した detect アクションを含む PDDLStream ベースの TAMP 計画機を統合し、情報量の多い視点を促進する。
Execution 中に co-location モデルを有効/無効化するセマンティクスに導かれた co-location toggler を使用する。
LLM priors と co-location の有無を比較する variants を、累積計画/実行時間と再計画回数で評価する。

Figure 1 : The initial beliefs about the semantic locations of objects, $\text{bel}(x_{r,0}^{k})$ and $\text{bel}(x_{s,0}^{k})$ , are derived from LLMs, while the initial beliefs about their poses, $\text{bel}(x_{p,0}^{k})$ , are uniformly distributed across all surfaces. The TAMP problem specificat

実験結果

リサーチクエスチョン

RQ1LLM 主導の事前知識は多様な家庭環境における PO-TAMP の計画と実行効率を改善するか？
RQ2セマンティックに情報化された co-location の手掛かりは、部分観測下での信念の refine とタスク成功をさらに高めるか？
RQ3長期的な計画には LLM ベースの信念更新 (LGBU) のみで十分か、それとも principled なベイズ更新が必要か？
RQ4対抗的または誤情報を含む常識 priors へのアプローチの頑健性はどの程度か？

主な発見

LLM 生成 priors は、セマンティック priors を用いないベースラインと比較して累積計画・実行時間を削減する。
LLM の埋め込みに基づく co-location モデルは、計画時間と再計画回数をさらに削減し、ばらつきを低減する。
信念更新のみを LLM に依存する場合（LGBU）は、長期的なタスクにはベイズ更新より頑健性が劣る。
対抗的な設定では、ベイズ更新が LGBU が失敗した複数の試行でタスク完遂を維持した。
人型ロボット（HSR）を用いた実世界実験では、LLM priors と co-location の組み合わせによって大幅な時間削減を示した。

Figure 2 : Example of a simulated household environment.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。