[論文レビュー] LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination
要約: Hierarchical Language Agent (HLA) を Slow Mind(熟練 LLM)、Fast Mind(軽量 LLM)、Executor(スクリプト方針)で実装し、Overcooked におけるリアルタイムの人間-AI 協調を実現。ベースラインより迅速な応答と優れた指示推論を達成し、人間評価も良好。
AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.
研究の動機と目的
- リアルタイムでの言語ベースの人間-AI 協調を、遅い API駆動の LLM エージェントを超えて動機づける。
- 堅牢な推論と高速・リアルタイムの行動生成を組み合わせた階層エージェントを設計する。
- 階層的アプローチが Overcooked において応答時間を短縮し、人間-AI の協調を向上させることを示す。
- 複雑・曖昧・数量ベースの命令に対する命令推論を強化する。
- レイテンシ測定、命令性能テスト、人間調査を通じて検証する。
提案手法
- Three-module Hierarchical Language Agent (HLA): Slow Mind (proficient LLM) for intention reasoning and chat interaction, Fast Mind (lightweight LLM) for generating macro actions, and Executor (script policy) for converting macro actions to atomic actions.
- Slow Mind operates in two stages: Intention Reasoning Stage to infer human intention from history and command, followed by Chat & Assessment Stage to communicate with humans and track completion progress.
- Fast Mind uses a conditional prompt mechanism and an action-filtering scheme to generate macro actions at a medium frequency, guided by Slow Mind’s inferred intention and a quantitative utility term (log U(a|s) ∝ log P_LLMa|s + α V(a|s)).
- Executor transforms macro actions (e.g., Chop, Cook, Serve) into atomic actions and performs path planning; macro actions are 21 in total with target-specific variants.
- The system runs Slow Mind and Fast Mind asynchronously to balance reasoning and real-time action, with the Electron prompt structure and two-stage Slow Mind prompts (Intention Reasoning and Chat & Assessment) illustrated in prompts.
- Equation referenced: log U(a|s) ∝ log P_LLMa|s + α V(a|s) (action selection in Fast Mind).
実験結果
リサーチクエスチョン
- RQ1How can we design an LLM-powered agent that maintains real-time responsiveness while preserving robust intention reasoning for human commands?
- RQ2Does a hierarchical combination of a proficient LLM, a lightweight LLM, and a reactive script policy improve latency, command understanding, and cooperative performance in a fast-paced domain?
- RQ3Can the Slow Mind’s two-stage reasoning and assessment framework improve handling of complex commands (quantity, semantics, ambiguity) compared to baselines?
- RQ4Do human users prefer and perform better with an HLA-based partner in real-time cooperation tasks?
主な発見
- HLA achieves macro-action latency that is 74.3% lower than the Slow-Mind-Only Agent (SMOA) and 53.5% lower than the Fast-Mind-Only Agent (FMOA).
- HLA achieves an atomic-action latency of 0.08s, an order of magnitude faster than the best baseline (0.28s).
- On average across maps, HLA yields higher game scores than baselines, e.g., Ring: 114.4 (HLA) vs 80.9 (SMOA) and 92.5 (FMOA); Partition: 100.3 vs 33.0 and 57.7; Bottleneck: 130.3 vs 102.4 and 103.8; Quick: 117.2 vs 60.8 and 71.2.
- Human studies show HLA achieving approximately 50% higher game scores than baselines and the highest human preference for communication accuracy and overall experience.
- Ablation studies indicate the two-stage Slow Mind design and intention reasoning significantly improve performance, especially on ambiguous and semantics-driven commands.
- HLA demonstrates higher ratio of valuable macro actions and lower fire accidents in human trials, indicating effective coordination and safer execution.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。