QUICK REVIEW

[论文解读] The AI Agent Index

Stephen Casper, Luke Bailey|ArXiv.org|Feb 3, 2025

Computability, Logic, AI Algorithms被引用 3

一句话总结

引介 AI Agent Index，这是首个公开记录已部署的代理性 AI 系统的数据库，基于公开来源与开发者通信，详细说明其组成、用途与安全实践。

ABSTRACT

Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, and safety features of agentic systems. To fill this gap, we introduce the AI Agent Index, the first public database to document information about currently deployed agentic AI systems. For each system that meets the criteria for inclusion in the index, we document the system's components (e.g., base model, reasoning implementation, tool use), application domains (e.g., computer use, software engineering), and risk management practices (e.g., evaluation results, guardrails), based on publicly available information and correspondence with developers. We find that while developers generally provide ample information regarding the capabilities and applications of agentic systems, they currently provide limited information regarding safety and risk management practices. The AI Agent Index is available online at https://aiagentindex.mit.edu/

研究动机与目标

为记录代理性 AI 系统的技术、安全与政策相关特征提供结构化框架。
识别符合纳入标准的当前已部署代理性系统，并使用该框架进行公开文档化。
分析并报告在地理分布、产业 vs 学术、开放性与风险管理等方面的高层趋势。

提出的方法

基于 Chan 等人（2023）的低规格化、影响直接性、目标导向性与长远规划来设定纳入标准。
自公开来源与开发者通信中汇集截至 2024 年 12 月 31 日的已部署代理性系统样本（n = 67）。
在六个类别中收集 33 字段的代理卡（基本信息、开发者、系统组件、护栏、评估、生态系统）。
记录代码与文档的开放性；在可获得的情况下汇集安全策略与外部评估。
分析人口统计（国家、学术 vs 工业）与领域分布；讨论局限性与治理含义。

Figure 1: Most AI agent developers in the index provide some public documentation (70.1%), while about half (49.3%) release their underlying code.

实验结果

研究问题

RQ1哪些组织在开发代理性系统？这些系统部署在哪些领域？
RQ2代理性系统需要什么基础设施，如何评估绩效与安全性？
RQ3公开披露了哪些护栏、安全政策与风险管理做法？
RQ4在代码、文档与安全信息方面，代理性系统的开放程度如何？
RQ5从指数中观察到的模式，对治理有何影响？

主要发现

已编入 67 个代理性系统，部署可追溯至 2023 年，2024 年部署率上升。
67 个代理中有 45 个（45/67）由美国开发者开发；大多数为行业驱动（49/67） vs 学术（18/67）。
67 个代理中有 50 个（74.6%）专注于软件工程或计算机使用；其中许多面向客户服务，但并非全部符合纳入标准。
有 33 个代理（49.3%）发布代码，47 个代理（70.1%）发布文档。
关于安全策略与评估的公开信息有限：有安全策略披露的比例为 19.4%（13/67）；外部安全评估的披露比例为 7.5%（5/67）；公开可用的安全评估信息比例为 9%（6/67）。
大多数与安全相关的披露来自少数大型公司（如 Anthropic、Google DeepMind、OpenAI）。

Figure 2: Only 19.4% of indexed agentic systems disclose a formal safety policy, and fewer than 10% report external safety evaluations.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。