QUICK REVIEW

[논문 리뷰] EmboTeam: Grounding LLM Reasoning into Reactive Behavior Trees via PDDL for Embodied Multi-Robot Collaboration

Haishan Zeng, Wang, Mengna|arXiv (Cornell University)|2026. 01. 16.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

H-AIM을 제안하는 세 단계 파이프라인으로 LLMs, PDDL 계획, 그리고 Behavior Trees를 동적이고 이질적인 다중 로봇 협업에 접목시키며, MACE-THOR 벤치마크에서 강력한 기준선 대비 의미 있는 개선을 입증했습니다.

ABSTRACT

In embodied artificial intelligence, enabling heterogeneous robot teams to execute long-horizon tasks from high-level instructions remains a critical challenge. While large language models (LLMs) show promise in instruction parsing and preliminary planning, they exhibit limitations in long-term reasoning and dynamic multi-robot coordination. We propose EmboTeam, a novel embodied multi-robot task planning framework that addresses these issues through a three-stage cascaded architecture: 1) It leverages an LLM to parse instructions and generate Planning Domain Definition Language (PDDL) problem descriptions, thereby transforming commands into formal planning problems; 2) It combines the semantic reasoning of LLMs with the search capabilities of a classical planner to produce optimized action sequences; 3) It compiles the resulting plan into behavior trees for reactive control. The framework supports dynamically sized heterogeneous robot teams via a shared blackboard mechanism for communication and state synchronization. To validate our approach, we introduce the MACE-THOR benchmark dataset, comprising 42 complex tasks across 8 distinct household layouts. Experiments show EmboTeam improves the task success rate from 12% to 55% and goal condition recall from 32% to 72% over the LaMMA-P baseline.

연구 동기 및 목표

Address long-horizon, heterogeneous multi-robot task planning from high-level natural language instructions.
Integrate semantic LLM reasoning with formal PDDL planning and reactive behavior trees for robust execution.
Enable dynamic, scalable robot teams via a shared blackboard communication mechanism.
Provide a new benchmark (MACE-THOR) for evaluating complex household tasks in simulated environments.

제안 방법

Three-stage cascaded architecture: PDDL File Generator (PFG) converts instructions into PDDL planning problems via LLM-based semantic parsing and co-optimized task decomposition and allocation.
Hybrid Planner (HP) couples LLM-assisted semantic validation and a classical planner (FastDownward) to generate sub-plans and merge them into a globally coherent plan using LLM-based coordination.
Behavior Tree Compiler (BTC) translates the global plan into a parallel behavior tree with precondition checks, fallbacks, and synchronization via a shared blackboard for reactive multi-robot control.

실험 결과

연구 질문

RQ1 How can LLMs be integrated with formal PDDL planning to handle long-horizon, heterogeneous multi-robot tasks?
RQ2 Can a three-stage pipeline (PFG, HP, BTC) provide robust, fault-tolerant execution in dynamic environments?
RQ3 Does a shared blackboard communication mechanism improve synchronization and coordination among dynamically sized robot teams?
RQ4 What gains can be achieved on a challenging multi-robot benchmark in terms of success rate and goal recall compared to strong baselines?

주요 결과

H-AIM substantially improves task success rate and goal condition recall over the strongest baseline (LaMMA-P) on the MACE-THOR benchmark, with a notable rise in performance when using higher-capability LLMs (e.g., GPT-4o).
The PFG enables rational task decomposition and skill assignment that maximize parallelism and atomic task execution, while ensuring compatibility with robot capabilities.
The HP merges sub-plans using semantic reasoning to resolve temporal and resource conflicts, producing globally coherent plans.
The BTC provides robust execution through preconditions, recovery/retry, core action execution, and post-validation, transforming linear plans into fault-tolerant behavior trees.
A shared blackboard mechanism is crucial for synchronization and collision avoidance in temporal-dependent tasks and dynamic environments.
Ablation studies show removing PFG or HP disrupts planning, removing BTC reduces execution reliability, and full integration yields optimal performance.
The MACE-THOR dataset includes 42 tasks across 8 environments, categorized into Parallel-Independent and Temporal-Dependent tasks, enabling evaluation of decomposition, allocation, and collaboration.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.