QUICK REVIEW

[论文解读] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation

Yaoxiang Wang, Zhiyong Wu|arXiv (Cornell University)|Feb 15, 2024

Multi-Agent Systems and Negotiation被引用 5

一句话总结

TDAG 引入动态任务分解与子代理生成，以提升基于 LLM 的代理性能；在 ItineraryBench 旅行规划基准和细粒度指标上进行评估。

ABSTRACT

The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks. However, these agents often struggle during task execution due to methodological constraints, such as error propagation and limited adaptability. To address this issue, we propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG). This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent, thereby enhancing adaptability in diverse and unpredictable real-world tasks. Simultaneously, existing benchmarks often lack the granularity needed to evaluate incremental progress in complex, multi-step tasks. In response, we introduce ItineraryBench in the context of travel planning, featuring interconnected, progressively complex tasks with a fine-grained evaluation system. ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity. Our experimental results reveal that TDAG significantly outperforms established baselines, showcasing its superior adaptability and context awareness in complex task scenarios.

研究动机与目标

在现实世界的多步骤任务中激发对更具适应性的基于 LLM 的代理的需求。
提出一个能够动态分解任务并为子任务生成定制子代理的多代理框架。
引入 ItineraryBench 以实现对复杂任务增量进展的细粒度评估。
证明 TDAG 相对于基线的优势并分析动态分解和代理生成的影响。

提出的方法

用 MainAgent 将复杂任务分解为子任务，并将每个子任务分配给动态生成的 SubAgent。
基于前一子任务的结果动态更新子任务，以应对失败和新信息。
通过 LLM 提示生成子代理，包括工具文档 refinement 和增量技能库。
维护一个演化中的技能库，通过一个小型 SentenceBERT 模型进行检索以匹配子任务与技能。
将 TDAG 与基线（ReAct、P&S、P&E、ADAPT）进行比较，并进行消融实验（去除代理生成或动态分解）。

实验结果

研究问题

RQ1动态任务分解是否能提升多步骤任务的适应性和上下文感知？
RQ2是否通过自动生成子代理来减少人工工作量并在多样化任务中提升性能？
RQ3细粒度评估（部分进展）与复杂规划任务中的传统成功度量之间有何关系？

主要发现

Method	Type 1	Type 2	Type 3	Avg.
ReAct	43.84	42.69	42.54	43.02
P&S	41.28	46.48	43.27	43.68
P&E	39.09	47.44	42.03	42.85
ADAPT	42.73	48.58	42.92	44.74
TDAG (Ours)	49.78	50.96	46.51	49.08
w/o Agent Generation	47.2	47.1	45.78	46.69
w/o Dynamic Decomposition	44.7	50.04	43.94	46.23

TDAG 在 ItineraryBench 的任务类型上优于既有基线。
消融实验表明动态分解和代理生成对于达到峰值性能都至关重要。
固定计划方法（如 P&E）因缺乏计划自适应性而劣于 ReAct。
TDAG 相较于基线在常识性、外部信息和约束错误方面保持更低的错误率。
在额外基准（WebShop、TextCraft）上，TDAG 在奖励分数和成功率方面仍然优于基线。
细粒度评估显示在未实现全部任务成功时也能看到进展，而非二元评分所致。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。