QUICK REVIEW

[论文解读] Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

Yoonwoo Kim, Raghav Arora|arXiv (Cornell University)|Mar 4, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

论文提出CoCo-TAMP，一种利用LLMs提供常识先验和共位线索的PO-TAMP框架，改善长期任务中的信念估计与规划效率。

ABSTRACT

Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be found together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 62.7% in planning and execution time in simulation, and 72.6% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.

研究动机与目标

引入 CoCo-TAMP，这是一种用于PO-TAMP 的分层状态估计框架，利用LLMs 来塑造对房间、表面与物体姿态的信念。
结合来自LLMs 的两种常识性知识：物体的可能位置和基于物体相似性的共位线索。
开发一个带有可视化（visibility）感知观测模型的分层贝叶斯滤波器，以在规划与执行过程中更新信念。
在大规模仿真和真实机器人实验中证明规划和执行时间的显著减少。

提出的方法

通过多项选择题向LLMs 询问对象在房间与表面位置的先验信息。
利用LLM 的嵌入构建共位模型，以捕捉对象之间的相似性。
使用分层贝叶斯滤波器（房间、表面、姿态）并结合可视化感知观测模型来维护信念。
将基于PDDLStream 的TAMP规划器与一个 detect 动作整合，其成本与信念量成反比以鼓励获取信息性视野。
使用由语义引导的共位切换器，在执行阶段开启/关闭共位模型。
通过累计规划/执行时间和重新规划迭代次数来评估，比较有无LLM先验和共位的变体。

Figure 1 : The initial beliefs about the semantic locations of objects, $\text{bel}(x_{r,0}^{k})$ and $\text{bel}(x_{s,0}^{k})$ , are derived from LLMs, while the initial beliefs about their poses, $\text{bel}(x_{p,0}^{k})$ , are uniformly distributed across all surfaces. The TAMP problem specificat

实验结果

研究问题

RQ1LLM 驱动的先验知识是否在多样家庭环境中提升 PO-TAMP 的规划与执行效率？
RQ2在部分观测下，语义信息驱动的共位线索是否进一步提升信念精化与任务成功率？
RQ3仅依赖LLM 基于信念更新（LGBU）对于长期规划是否鲁棒，还是需要基于贝叶斯更新的原理？
RQ4在对抗性或具有误导性的常识先验下，该方法的鲁棒性如何？

主要发现

LLM 生成的先验相较于无语义先验的基线，显著降低了累计规划与执行时间。
基于LLM 嵌入的共位模型进一步降低规划时间与重新规划迭代次数，且方差较低。
仅使用LLMs 进行信念更新（LGBU）在长期任务上不如贝叶斯更新鲁棒。
在对抗性环境中，贝叶斯更新在若干试验中完成任务，而LGBU 则失败。
带有LLM 先验与共位线索的真人型机器人（HSR）的真实世界实验显示显著的时间减少。

Figure 2 : Example of a simulated household environment.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。