QUICK REVIEW

[논문 리뷰] Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

Yoonwoo Kim, Raghav Arora|arXiv (Cornell University)|2026. 03. 04.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

이 논문은 CoCo-TAMP를 제시합니다. PO-TAMP 프레임워크로 LLM을 사용하여 상식 사전 및 동시 위치 힌트를 제공하고, 장기 목표에서 신념 추정 및 계획 효율성을 향상시킵니다.

ABSTRACT

Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be found together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 62.7% in planning and execution time in simulation, and 72.6% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.

연구 동기 및 목표

CoCo-TAMP를 도입합니다. LLM을 활용하여 방, 표면, 물체의 자세에 대한 신념을 형성하는 PO-TAMP용 계층적 상태 추정 프레임워크.
LLM으로부터 두 가지 형태의 상식 지식을 통합합니다: 물체의 가능한 위치와 물체 유사성에 기반한 동시 배치(co-location) 힌트.
계획 및 실행 중 신념을 업데이트하기 위해 가시성 인식을 고려한 관측 모델을 갖춘 계층적 베이선 필터를 개발합니다.
대규모 시뮬레이션 및 실제 로봇 실험에서 상당한 계획 및 실행 시간 감소를 시연합니다.]
method2 Korean?

제안 방법

객체의 방 및 표면 위치에 대한 사전 정보를 다중 선택 질문을 통해 LLM에 질의하여 생성합니다.
객체 간 유사성을 포착하기 위해 LLM 임베딩을 사용하여 동시 배치(co-location) 모델을 구성합니다.
계층적 베이시안 필터(방, 표면, 자세)와 가시성 인식 관측 모델로 신념을 유지합니다.
정보적 관점을 유도하기 위해 신념 질량을 반비례로 추적하는 비용을 갖는 detect 동작을 포함하는 PDDLStream 기반 TAMP 플래너를 통합합니다.
의미 기반으로 안내되는 동시 배치 토글러를 사용하여 실행 중 동시 배치 모델을 활성화/비활성화합니다.
누적 계획/실행 시간과 재계획 반복 수를 사용하여 평가하고, LLM 사전 지식 및 동시 배치 여부가 있는/없는 변형을 비교합니다.]
research_questions1 Korean?

Figure 1 : The initial beliefs about the semantic locations of objects, $\text{bel}(x_{r,0}^{k})$ and $\text{bel}(x_{s,0}^{k})$ , are derived from LLMs, while the initial beliefs about their poses, $\text{bel}(x_{p,0}^{k})$ , are uniformly distributed across all surfaces. The TAMP problem specificat

실험 결과

연구 질문

RQ1다양한 가정 환경에서 LLM으로 구동되는 사전 지식이 PO-TAMP의 계획 및 실행 효율성을 개선합니까?
RQ2의미적으로 정보가 담긴 동시 배치 힌트가 부분 관찰성하에서 신념 정제와 작업 성공을 더욱 향상시키나요?
RQ3장기 목표 계획에서 LLM 기반 신념 업데이트(LGBU)만으로도 견고한가요, 아니면 원칙에 입각한 베이즈 업데이트가 필요합니까?
RQ4적대적이거나 오도하는 상식 사전(priors)에 대한 접근법의 강건성은 어느 정도인가요?

주요 결과

LLM으로 생성된 사전 정보는 의미론적 사전이 없는 기준선에 비해 누적 계획 및 실행 시간을 감소시킵니다.
LLM 임베딩 기반의 동시 배치 모델은 계획 시간과 재계획 반복 수를 더 감소시키고 변동성을 낮춥니다.
LLMs를 이용한 신념 업데이트(LGBU)만으로는 장기 목표에는 베이지안 업데이트보다 견고하지 않습니다.
적대적 환경에서 베이지안 업데이트는 LGBU가 여러 차례 실패한 경우에도 작업 완수를 유지했습니다.
휴먼노이드 로봇(HSR)을 이용한 실제 실험에서 LLM 사전 지식과 동시 배치 힌트를 결합했을 때 상당한 시간 감소가 나타났습니다.

Figure 2 : Example of a simulated household environment.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.