[논문 리뷰] EmboTeam: Grounding LLM Reasoning into Reactive Behavior Trees via PDDL for Embodied Multi-Robot Collaboration
H-AIM을 제안하는 세 단계 파이프라인으로 LLMs, PDDL 계획, 그리고 Behavior Trees를 동적이고 이질적인 다중 로봇 협업에 접목시키며, MACE-THOR 벤치마크에서 강력한 기준선 대비 의미 있는 개선을 입증했습니다.
In embodied artificial intelligence, enabling heterogeneous robot teams to execute long-horizon tasks from high-level instructions remains a critical challenge. While large language models (LLMs) show promise in instruction parsing and preliminary planning, they exhibit limitations in long-term reasoning and dynamic multi-robot coordination. We propose EmboTeam, a novel embodied multi-robot task planning framework that addresses these issues through a three-stage cascaded architecture: 1) It leverages an LLM to parse instructions and generate Planning Domain Definition Language (PDDL) problem descriptions, thereby transforming commands into formal planning problems; 2) It combines the semantic reasoning of LLMs with the search capabilities of a classical planner to produce optimized action sequences; 3) It compiles the resulting plan into behavior trees for reactive control. The framework supports dynamically sized heterogeneous robot teams via a shared blackboard mechanism for communication and state synchronization. To validate our approach, we introduce the MACE-THOR benchmark dataset, comprising 42 complex tasks across 8 distinct household layouts. Experiments show EmboTeam improves the task success rate from 12% to 55% and goal condition recall from 32% to 72% over the LaMMA-P baseline.
연구 동기 및 목표
- Address long-horizon, heterogeneous multi-robot task planning from high-level natural language instructions.
- Integrate semantic LLM reasoning with formal PDDL planning and reactive behavior trees for robust execution.
- Enable dynamic, scalable robot teams via a shared blackboard communication mechanism.
- Provide a new benchmark (MACE-THOR) for evaluating complex household tasks in simulated environments.
제안 방법
- Three-stage cascaded architecture: PDDL File Generator (PFG) converts instructions into PDDL planning problems via LLM-based semantic parsing and co-optimized task decomposition and allocation.
- Hybrid Planner (HP) couples LLM-assisted semantic validation and a classical planner (FastDownward) to generate sub-plans and merge them into a globally coherent plan using LLM-based coordination.
- Behavior Tree Compiler (BTC) translates the global plan into a parallel behavior tree with precondition checks, fallbacks, and synchronization via a shared blackboard for reactive multi-robot control.
실험 결과
연구 질문
- RQ1 How can LLMs be integrated with formal PDDL planning to handle long-horizon, heterogeneous multi-robot tasks?
- RQ2 Can a three-stage pipeline (PFG, HP, BTC) provide robust, fault-tolerant execution in dynamic environments?
- RQ3 Does a shared blackboard communication mechanism improve synchronization and coordination among dynamically sized robot teams?
- RQ4 What gains can be achieved on a challenging multi-robot benchmark in terms of success rate and goal recall compared to strong baselines?
주요 결과
- H-AIM substantially improves task success rate and goal condition recall over the strongest baseline (LaMMA-P) on the MACE-THOR benchmark, with a notable rise in performance when using higher-capability LLMs (e.g., GPT-4o).
- The PFG enables rational task decomposition and skill assignment that maximize parallelism and atomic task execution, while ensuring compatibility with robot capabilities.
- The HP merges sub-plans using semantic reasoning to resolve temporal and resource conflicts, producing globally coherent plans.
- The BTC provides robust execution through preconditions, recovery/retry, core action execution, and post-validation, transforming linear plans into fault-tolerant behavior trees.
- A shared blackboard mechanism is crucial for synchronization and collision avoidance in temporal-dependent tasks and dynamic environments.
- Ablation studies show removing PFG or HP disrupts planning, removing BTC reduces execution reliability, and full integration yields optimal performance.
- The MACE-THOR dataset includes 42 tasks across 8 environments, categorized into Parallel-Independent and Temporal-Dependent tasks, enabling evaluation of decomposition, allocation, and collaboration.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.