Skip to main content
QUICK REVIEW

[論文レビュー] EmboTeam: Grounding LLM Reasoning into Reactive Behavior Trees via PDDL for Embodied Multi-Robot Collaboration

Haishan Zeng, Wang, Mengna|arXiv (Cornell University)|Jan 16, 2026
Multimodal Machine Learning Applications被引用数 0
ひとこと要約

H-AIMを提案。3段階のパイプラインでLLM、PDDLプランニング、Behavior Treeを動的で異種のマルチロボット協働に結びつけ、MACE-THORベンチマークで強力なベースラインを大幅に上回る改善を検証。

ABSTRACT

In embodied artificial intelligence, enabling heterogeneous robot teams to execute long-horizon tasks from high-level instructions remains a critical challenge. While large language models (LLMs) show promise in instruction parsing and preliminary planning, they exhibit limitations in long-term reasoning and dynamic multi-robot coordination. We propose EmboTeam, a novel embodied multi-robot task planning framework that addresses these issues through a three-stage cascaded architecture: 1) It leverages an LLM to parse instructions and generate Planning Domain Definition Language (PDDL) problem descriptions, thereby transforming commands into formal planning problems; 2) It combines the semantic reasoning of LLMs with the search capabilities of a classical planner to produce optimized action sequences; 3) It compiles the resulting plan into behavior trees for reactive control. The framework supports dynamically sized heterogeneous robot teams via a shared blackboard mechanism for communication and state synchronization. To validate our approach, we introduce the MACE-THOR benchmark dataset, comprising 42 complex tasks across 8 distinct household layouts. Experiments show EmboTeam improves the task success rate from 12% to 55% and goal condition recall from 32% to 72% over the LaMMA-P baseline.

研究の動機と目的

  • 高レベル自然言語指示から長時間・異種のマルチロボットタスク計画を扱う.
  • 意味論的LLM推論と形式的PDDL計画およびリアクティブ振る舞いツリーを統合して堅牢な実行を図る.
  • 共通ブラックボード通信機構を介して動的にスケールするロボットチームを実現する.
  • シミュレート環境で複雑な家庭タスクを評価する新しいベンチマーク(MACE-THOR)を提供する。

提案手法

  • Three-stage cascaded architecture: PDDL File Generator (PFG) converts instructions into PDDL planning problems via LLM-based semantic parsing and co-optimized task decomposition and allocation.
  • Hybrid Planner (HP) couples LLM-assisted semantic validation and a classical planner (FastDownward) to generate sub-plans and merge them into a globally coherent plan using LLM-based coordination.
  • Behavior Tree Compiler (BTC) translates the global plan into a parallel behavior tree with precondition checks, fallbacks, and synchronization via a shared blackboard for reactive multi-robot control.

実験結果

リサーチクエスチョン

  • RQ1How can LLMs be integrated with formal PDDL planning to handle long-horizon, heterogeneous multi-robot tasks?
  • RQ2Can a three-stage pipeline (PFG, HP, BTC) provide robust, fault-tolerant execution in dynamic environments?
  • RQ3Does a shared blackboard communication mechanism improve synchronization and coordination among dynamically sized robot teams?
  • RQ4What gains can be achieved on a challenging multi-robot benchmark in terms of success rate and goal recall compared to strong baselines?

主な発見

  • H-AIM substantially improves task success rate and goal condition recall over the strongest baseline (LaMMA-P) on the MACE-THOR benchmark, with a notable rise in performance when using higher-capability LLMs (e.g., GPT-4o).
  • The PFG enables rational task decomposition and skill assignment that maximize parallelism and atomic task execution, while ensuring compatibility with robot capabilities.
  • The HP merges sub-plans using semantic reasoning to resolve temporal and resource conflicts, producing globally coherent plans.
  • The BTC provides robust execution through preconditions, recovery/retry, core action execution, and post-validation, transforming linear plans into fault-tolerant behavior trees.
  • A shared blackboard mechanism is crucial for synchronization and collision avoidance in temporal-dependent tasks and dynamic environments.
  • Ablation studies show removing PFG or HP disrupts planning, removing BTC reduces execution reliability, and full integration yields optimal performance.
  • The MACE-THOR dataset includes 42 tasks across 8 environments, categorized into Parallel-Independent and Temporal-Dependent tasks, enabling evaluation of decomposition, allocation, and collaboration.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。