QUICK REVIEW

[论文解读] LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

Jijia Liu, Chao Yu|arXiv (Cornell University)|Dec 23, 2023

Topic Modeling被引用 9

一句话总结

引入 Hierarchical Language Agent (HLA) 及 Slow Mind（高效 LLM）、Fast Mind（轻量 LLM）与 Executor（脚本策略），实现 Overcooked 中的实时人机协作，相较基线具有更快响应和更强的命令推理；获得积极的人类评估。

ABSTRACT

AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.

研究动机与目标

Motivate real-time, language-based human-AI coordination beyond slow, API-driven LLM agents.
Design a hierarchical agent that combines strong reasoning with fast, real-time action generation.
Show that the hierarchical approach yields faster response times and improved human-AI cooperation in Overcooked.
Demonstrate enhanced command reasoning for complex, ambiguous, and quantity-based instructions.
Validate through latency measurements, command-performance tests, and human studies.

提出的方法

Three-module Hierarchical Language Agent (HLA): Slow Mind (proficient LLM) for intention reasoning and chat interaction, Fast Mind (lightweight LLM) for generating macro actions, and Executor (script policy) for converting macro actions to atomic actions.
Slow Mind operates in two stages: Intention Reasoning Stage to infer human intention from history and command, followed by Chat & Assessment Stage to communicate with humans and track completion progress.
Fast Mind uses a conditional prompt mechanism and an action-filtering scheme to generate macro actions at a medium frequency, guided by Slow Mind’s inferred intention and a quantitative utility term (log U(a|s) ∝ log P_LLMa|s + α V(a|s)).
Executor transforms macro actions (e.g., Chop, Cook, Serve) into atomic actions and performs path planning; macro actions are 21 in total with target-specific variants.
The system runs Slow Mind and Fast Mind asynchronously to balance reasoning and real-time action, with the Electron prompt structure and two-stage Slow Mind prompts (Intention Reasoning and Chat & Assessment) illustrated in prompts.
Equation referenced: log U(a|s) ∝ log P_LLMa|s + α V(a|s) (action selection in Fast Mind).

实验结果

研究问题

RQ1How can we design an LLM-powered agent that maintains real-time responsiveness while preserving robust intention reasoning for human commands?
RQ2Does a hierarchical combination of a proficient LLM, a lightweight LLM, and a reactive script policy improve latency, command understanding, and cooperative performance in a fast-paced domain?
RQ3Can the Slow Mind’s two-stage reasoning and assessment framework improve handling of complex commands (quantity, semantics, ambiguity) compared to baselines?
RQ4Do human users prefer and perform better with an HLA-based partner in real-time cooperation tasks?

主要发现

HLA achieves macro-action latency that is 74.3% lower than the Slow-Mind-Only Agent (SMOA) and 53.5% lower than the Fast-Mind-Only Agent (FMOA).
HLA achieves an atomic-action latency of 0.08s, an order of magnitude faster than the best baseline (0.28s).
On average across maps, HLA yields higher game scores than baselines, e.g., Ring: 114.4 (HLA) vs 80.9 (SMOA) and 92.5 (FMOA); Partition: 100.3 vs 33.0 and 57.7; Bottleneck: 130.3 vs 102.4 and 103.8; Quick: 117.2 vs 60.8 and 71.2.
Human studies show HLA achieving approximately 50% higher game scores than baselines and the highest human preference for communication accuracy and overall experience.
Ablation studies indicate the two-stage Slow Mind design and intention reasoning significantly improve performance, especially on ambiguous and semantics-driven commands.
HLA demonstrates higher ratio of valuable macro actions and lower fire accidents in human trials, indicating effective coordination and safer execution.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。