QUICK REVIEW

[论文解读] AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

Zhi‐Wei Liu, Weiran Yao|arXiv (Cornell University)|Feb 23, 2024

Multi-Agent Systems and Negotiation被引用 10

一句话总结

AgentLite 提供一个轻量级、开源框架，用于原型化和评估面向任务的 LLM 代理和多代理系统，便于自定义提示、记忆、行动和架构。它展示基准测试和多样化应用来证明灵活性和性能。

ABSTRACT

The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from single agent generation to multi-agent conversation, as well as multi-LLM multi-agent group chat. However, with the existing intricate frameworks and libraries, creating and evaluating new reasoning strategies and agent architectures has become a complex challenge, which hinders research investigation into LLM agents. Thus, we open-source a new AI agent library, AgentLite, which simplifies this process by offering a lightweight, user-friendly platform for innovating LLM agent reasoning, architectures, and applications with ease. AgentLite is a task-oriented framework designed to enhance the ability of agents to break down tasks and facilitate the development of multi-agent systems. Furthermore, we introduce multiple practical applications developed with AgentLite to demonstrate its convenience and flexibility. Get started now at: \url{https://github.com/SalesforceAIResearch/AgentLite}.

研究动机与目标

激发对一个轻量级、面向研究的库以原型化 LLM 代理推理策略和架构的需求。
提供一个简单的、面向任务的框架，促进多代理编排与实验。
通过基准测试和多样化应用展示实际可用性。
展示 AgentLite 在不同 LLM 体系和场景下的易集成与评估能力。

提出的方法

引入一个包含四个模块的个人代理（PromptGen、Actions、LLM、Memory）以及一个用于分层多代理编排的管理代理。
将 TaskPackage（TP）定义为管理端与团队代理之间的通信单元，并描述其属性。
描述如何通过扩展 Action 模块来添加新的推理类型（例如 Think、类似 ReAct 的步骤），并给出 Think 动作的代码草图。
解释如何通过配置 Actions、团队组成和 LLM 后端来实现新的代理架构（Copilot Agent、Copilot Multi-Agent、Multi-LLM Multi-Agent）。

实验结果

研究问题

RQ1轻量级框架如何加速开发和评估新的 LLM 代理推理策略与架构？
RQ2面向任务的分层多代理设计能否提高 LLM 代理的模块化和实验灵活性？
RQ3在 HotPotQA、Webshop 等既定基准上，AgentLite 在不同 LLM 后端上的表现如何？
RQ4可以轻松构建哪些应用套件来展示 AgentLite 在不同领域的多样性？

主要发现

LLM	简单 F1	简单准确率	中等 F1	中等准确率	困难 F1	困难准确率
GPT-3.5-Turbo-16k-0613	0.410	0.35	0.330	0.25	0.283	0.20
GPT-4-0613	0.611	0.47	0.610	0.48	0.527	0.38
GPT-4-32k-0613	0.625	0.46	0.644	0.54	0.520	0.37
xLAM-v0.1	0.532	0.45	0.547	0.46	0.455	0.36

AgentLite 通过分层的管理代理设置实现多代理编排。
该框架通过扩展 Actions 并将推理与工具使用统一（如 Think 作为一个行动）来支持新增的推理类型。
AgentLite 在基准测试中表现具有竞争力，且支持多种 LLM 后端，包括 GPT-4 变体和 xLAM-v0.1。
HotPotQA 的实验表明 GPT-4 变体优于 GPT-3.5，其中 GPT-4-32k-0613 在中等难度的 F1 和准确率方面达到更高；在该设置中 xLAM-v0.1 也较 GPT-3.5 有所提升。
在 Webshop 中，GPT-4-32k 获得更高的平均回报，表明上下文长度的优势；在此环境中 xLAM-v0.1 仍与 GPT-3.5 具有竞争力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。