QUICK REVIEW

[논문 리뷰] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation

Yaoxiang Wang, Zhiyong Wu|arXiv (Cornell University)|2024. 02. 15.

Multi-Agent Systems and Negotiation인용 수 5

한 줄 요약

TDAG는 LLM 기반 에이전트를 개선하기 위한 동적 작업 분해 및 하위 에이전트 생성을 도입하고, ItineraryBench 여행 계획 벤치마크와 세밀한 지표로 평가되었다.

ABSTRACT

The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks. However, these agents often struggle during task execution due to methodological constraints, such as error propagation and limited adaptability. To address this issue, we propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG). This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent, thereby enhancing adaptability in diverse and unpredictable real-world tasks. Simultaneously, existing benchmarks often lack the granularity needed to evaluate incremental progress in complex, multi-step tasks. In response, we introduce ItineraryBench in the context of travel planning, featuring interconnected, progressively complex tasks with a fine-grained evaluation system. ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity. Our experimental results reveal that TDAG significantly outperforms established baselines, showcasing its superior adaptability and context awareness in complex task scenarios.

연구 동기 및 목표

실제 세계의 다단계 작업에서 더 적응적인 LLM 기반 에이전트의 필요성을 제기한다.
작업을 동적으로 분해하고 하위 작업에 맞춘 하위 에이전트를 생성하는 다중 에이전트 프레임워크를 제안한다.
복잡한 작업에서 점진적 진전을 세밀하게 평가할 수 있도록 ItineraryBench를 소개한다.
TDAG의 baselines에 대한 우수성을 입증하고 동적 분해 및 에이전트 생성의 영향을 분석한다.

제안 방법

주에이전트(MainAgent)로 복잡한 작업을 하위 작업으로 분해하고 각 하위 작업을 동적으로 생성된 SubAgent에 할당한다.
선행 하위 작업의 결과를 기반으로 하위 작업을 동적으로 업데이트하여 실패 및 새로운 정보를 처리한다.
도구 문서 정제 및 점진적 기술 라이브러리를 포함하여 LLM 프롬프트를 통해 하위 에이전트를 생성한다.
작은 SentenceBERT 모델로 검색하여 하위 작업을 기술과 매칭하는 진화하는 기술 라이브러리를 유지한다.
TDAG를 기본 Baseline(ReAct, P&S, P&E, ADAPT)과 비교하고 에이전트 생성 또는 동적 분해를 제거하는 제거 실험(ablations)을 수행한다.

실험 결과

연구 질문

RQ1동적 작업 분해가 다단계 작업에서 적응성과 맥락 인식을 향상시킬 수 있는가?
RQ2자동 하위 에이전트 생성이 수작업 노력을 줄이고 다양한 작업에서 성능을 향상시키는가?
RQ3정밀한 평가(부분 진행)가 복잡한 계획 작업에서 전통적 성공 지표와 어떻게 상관관계가 있는가?

주요 결과

Method	Type 1	Type 2	Type 3	Avg.
ReAct	43.84	42.69	42.54	43.02
P&S	41.28	46.48	43.27	43.68
P&E	39.09	47.44	42.03	42.85
ADAPT	42.73	48.58	42.92	44.74
TDAG (Ours)	49.78	50.96	46.51	49.08
w/o Agent Generation	47.2	47.1	45.78	46.69
w/o Dynamic Decomposition	44.7	50.04	43.94	46.23

TDAG는 ItineraryBench의 유형 전반에서 기존 기준선보다 우수하다.
제거 실험은 동적 분해와 에이전트 생성 모두가 최고 성능 달성에 중요하다는 것을 보인다.
고정 계획 방식(예: P&E)은 계획 적응성 부족으로 ReAct보다 성능이 떨어진다.
TDAG는 상식, 외부 정보, 제약 오류 비율이 기준선에 비해 낮다.
추가 벤치마크(WebShop, TextCraft)에서도 TDAG는 보상 점수와 성공률에서 여전히 기준선을 능가한다.
세밀한 평가를 통해 전체 작업 성공이 달성되지 않아도 진행 상황이 드러난다, 이진 점수와 달리.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.