QUICK REVIEW

[논문 리뷰] AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

Zhi‐Wei Liu, Weiran Yao|arXiv (Cornell University)|2024. 02. 23.

Multi-Agent Systems and Negotiation인용 수 10

한 줄 요약

AgentLite은 작업 지향형 LLM 에이전트 및 다중 에이전트 시스템을 프로토타이핑하고 평가하기 위한 가볍고 오픈 소스인 프레임워크를 제공하여 프롬프트, 메모리, 액션 및 아키텍처의 손쉬운 커스터마이징을 가능하게 합니다. 유연성과 성능을 입증하기 위해 벤치마크와 다양한 응용 사례를 보여줍니다.

ABSTRACT

The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from single agent generation to multi-agent conversation, as well as multi-LLM multi-agent group chat. However, with the existing intricate frameworks and libraries, creating and evaluating new reasoning strategies and agent architectures has become a complex challenge, which hinders research investigation into LLM agents. Thus, we open-source a new AI agent library, AgentLite, which simplifies this process by offering a lightweight, user-friendly platform for innovating LLM agent reasoning, architectures, and applications with ease. AgentLite is a task-oriented framework designed to enhance the ability of agents to break down tasks and facilitate the development of multi-agent systems. Furthermore, we introduce multiple practical applications developed with AgentLite to demonstrate its convenience and flexibility. Get started now at: \url{https://github.com/SalesforceAIResearch/AgentLite}.

연구 동기 및 목표

LLM 에이전트 추론 전략과 아키텍처를 프로토타입하기 위한 가볍고 연구 친화적인 라이브러리의 필요성에 대해 동기 부여합니다.
다중 에이전트 오케스트레이션과 실험을 용이하게 하는 간단한 작업 지향 프레임워크를 제공합니다.
벤치마크와 다양한 응용 사례를 통해 실용적 적용 가능성을 보여줍니다.
AgentLite가 서로 다른 LLM 백본과 시나리오에 걸쳐 손쉽게 통합 및 평가를 지원한다는 것을 보여줍니다.

제안 방법

네 개 모듈의 개별 에이전트(PromptGen, Actions, LLM, Memory)와 계층적 다중 에이전트 오케스트레이션을 위한 관리 에이전트를 도입합니다.
Manager와 팀 에이전트 간의 통신 단위로 TaskPackage(TP)를 정의하고 그 속성을 설명합니다.
Think와 같은 사고 유형을 Action 모듈을 확장하여 새로운 추론 유형을 추가하는 방법을 설명하고 Think 액션의 코드 스케치를 제공합니다.
작업 설정, 팀 구성 및 LLM 백엔드를 구성하여 Copilot 에이전트, Copilot 다중 에이전트, 다중-LLM 다중 에이전트와 같은 새로운 에이전트 아키텍처를 구현하는 방법을 설명합니다.

실험 결과

연구 질문

RQ1가벼운 프레임워크가 새로운 LLM 에이전트 추론 전략 및 아키텍처의 개발과 평가를 어떻게 가속화할 수 있는가?
RQ2작업 지향적이고 계층화된 다중 에이전트 설계가 LLM 에이전트의 모듈성 및 실험적 유연성을 향상시킬 수 있는가?
RQ3AgentLite는 HotPotQA 및 Webshop과 같은 정립된 벤치마크에서 서로 다른 LLM 백본으로 어떻게 성능을 발휘하는가?
RQ4AgentLite의 다재다능성을 도메인 간 데모를 쉽게 통해 보여주기 위해 어떤 응용 패키지를 구축할 수 있는가?

주요 결과

LLM	쉬운 F1	쉬운 정확도	중간 F1	중간 정확도	어려운 F1	어려운 정확도
GPT-3.5-Turbo-16k-0613	0.410	0.35	0.330	0.25	0.283	0.20
GPT-4-0613	0.611	0.47	0.610	0.48	0.527	0.38
GPT-4-32k-0613	0.625	0.46	0.644	0.54	0.520	0.37
xLAM-v0.1	0.532	0.45	0.547	0.46	0.455	0.36

AgentLite은 계층적 관리-에이전트 구성을 통한 다중 에이전트 오케스트레이션을 가능하게 합니다.
액션을 확장하고 도구 사용과 추론을 통합하여 새로운 추론 유형을 추가하는 것을 지원합니다(예: Think를 액션으로 간주).
AgentLite는 벤치마크에서 경쟁력 있는 성능을 입증하고 GPT-4 계열 및 xLAM-v0.1을 포함한 여러 LLM 백본을 지원합니다.
HotPotQA에서 GPT-4 계열이 GPT-3.5를 능가하는 것으로 보이며, GPT-4-32k-0613가 중간 수준의 F1 및 정확도에서 더 높은 값을 달성합니다; 이 설정에서 xLAM-v0.1도 GPT-3.5에 비해 개선을 보입니다.
Webshop에서 GPT-4-32k가 더 높은 평균 보상을 달성하여 맥락 길이의 이점을 시사합니다; xLAM-v0.1은 이 환경에서 여전히 GPT-3.5와 경쟁력을 유지합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.