[论文解读] TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
TaskMatrix.AI 提出一个生态系统,使用多模态对话基础模型作为大脑来编排数百万个 API 和现成模型,以完成数字和物理任务,具有可学习的对齐与 API 驱动执行。
Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still face difficulties with some specialized tasks because they lack enough domain-specific data during pre-training or they often have errors in their neural network computations on those tasks that need accurate executions. On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well. However, due to the different implementation or working mechanisms, they are not easily accessible or compatible with foundation models. Therefore, there is a clear and pressing need for a mechanism that can leverage foundation models to propose task solution outlines and then automatically match some of the sub-tasks in the outlines to the off-the-shelf models and systems with special functionalities to complete them. Inspired by this, we introduce TaskMatrix.AI as a new AI ecosystem that connects foundation models with millions of APIs for task completion. Unlike most previous work that aimed to improve a single AI model, TaskMatrix.AI focuses more on using existing foundation models (as a brain-like central system) and APIs of other AI models and systems (as sub-task solvers) to achieve diversified tasks in both digital and physical domains. As a position paper, we will present our vision of how to build such an ecosystem, explain each key component, and use study cases to illustrate both the feasibility of this vision and the main challenges we need to address next.
研究动机与目标
- 推动将基础模型与多样化的 API 连接起来的必要性,以处理超出预训练数据的领域特定任务。
- 提出一种架构,使用核心多模态基础模型生成可执行的 API 驱动计划。
- 定义一个具有统一文档架构的 API 平台,以实现可扩展的 API 集成。
- 引入从反馈中学习的机制,使基础模型和 API 选择器与可用 API 保持一致。
- 展示跨多模态内容创建、办公自动化、机器人技术和云服务的潜在应用。
提出的方法
- 定义四组件架构:Multimodal Conversational Foundation Model (MCFM)、API Platform、API Selector 和 API Executor。
- MCFM 根据用户指令、上下文和 API 可用性生成解决方案大纲和行动码。
- API Platform 提供统一的 API 文档架构,以促进 API 的使用与组合。
- API Selector 语义检索相关 API,并支持模块化的领域特定软件包。
- API Executor 运行生成的行动代码,并包含一个验证步骤以确保任务满意。
- 整合来自人类反馈的强化学习(RLHF),以提升对 API 的理解和任务规划,并向 API 开发者提供反馈以改进 API 文档。
实验结果
研究问题
- RQ1如何利用基金会模型生成可执行的任务大纲,使之映射到大量 API?
- RQ2哪些架构机制能够实现多模态任务的可扩展 API 选择、执行与验证?
- RQ3RLHF 与开发者反馈如何随着时间推移提升核心模型与 API 文档之间的一致性?
- RQ4哪些实际应用能够证明将基础模型与数百万 API 连接在数字与物理领域的可行性?
主要发现
- 该架构通过生成解决方案大纲、选择 API 和执行行动代码,实现任务的顺序求解。
- 统一的 API 文档架构和 API 平台,方便基础模型对 API 的集成与复用。
- RLHF 和对 API 开发者的反馈促进更快学习、更好地使用 API,以及随时间推移改进文档。
- 该方法同时支持数字与物理任务,包括内容生成、办公自动化、机器人技术以及物联网/计算任务。
- 该系统通过显式的行动代码和 API 结果强调可解释性,使任务执行可追溯。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。