QUICK REVIEW

[論文レビュー] The AI Agent Index

Stephen Casper, Luke Bailey|ArXiv.org|Feb 3, 2025

Computability, Logic, AI Algorithms被引用数 3

ひとこと要約

AI Agent Indexを introducing first public database documenting deployed agentic AI systems… public sources and developer correspondence に基づく、構成要素、用途、安全 practices を詳述する。

ABSTRACT

Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, and safety features of agentic systems. To fill this gap, we introduce the AI Agent Index, the first public database to document information about currently deployed agentic AI systems. For each system that meets the criteria for inclusion in the index, we document the system's components (e.g., base model, reasoning implementation, tool use), application domains (e.g., computer use, software engineering), and risk management practices (e.g., evaluation results, guardrails), based on publicly available information and correspondence with developers. We find that while developers generally provide ample information regarding the capabilities and applications of agentic systems, they currently provide limited information regarding safety and risk management practices. The AI Agent Index is available online at https://aiagentindex.mit.edu/

研究の動機と目的

エージェント的AIシステムの技術的、安全性、政策に関連する特徴を文書化するための構造化フレームワークを提供する。
inclusion criteria を満たす現在展開中のエージェント的システムを識別し、フレームワークを用いて公的に文書化する。
地理、産業対 academia、オープン性、リスク管理に関するエージェント的AIエコシステムの高レベルな傾向を分析・報告する。

提案手法

Chan et al., 2023 に準じた underspecification、直接性の影響、目標指向性、長期計画性に基づく包含基準を定義する。
公開ソースと開発者の書簡から、2024年12月31日現在の展開済みエージェント系システムのサンプル（n = 67）を編纂する。
六つのカテゴリ（基本情報、開発者、システム構成要素、ガードレール、評価、エコシステム）にわたる33フィールドのエージェントカードを収集する。
コードとドキュメンテーションの公開性を記録し、利用可能な場合には安全ポリシーと外部評価をカタログ化する。
デモグラフィックス（国、学術 vs 産業）とドメイン分布を分析し、制限とガバナンスの示唆を論じる。

Figure 1: Most AI agent developers in the index provide some public documentation (70.1%), while about half (49.3%) release their underlying code.

実験結果

リサーチクエスチョン

RQ1エージェント系统を開発している組織はどこで、どのドメインに展開されているのか。
RQ2エージェント系システムにはどのようなインフラが必要で、性能と安全性はどう評価されているのか。
RQ3公開されているガードレール、安全ポリシー、リスク管理慣行はどの程度開示されているのか。
RQ4コード、ドキュメンテーション、安全情報の公開度はどの程度か。
RQ5インデックスで観察されるパターンから、どのようなガバナンス上の含意が生じるのか。

主な発見

67のエージェント系システムがインデックス化され、展開は2023年に遡り、2024年を通じて展開速度が上昇している。
67のエージェント中、45件は米国拠点の開発者によって開発され、ほとんどが産業主導（49/67）で、 academia は18/67。
67のエージェントのうち50件（74.6%）がソフトウェア工学またはコンピュータ利用を専門としており、多くは顧客サービス志向だが、全てが包含基準を満たすわけではない。
33のエージェント（49.3%）がコードを公開し、47のエージェント（70.1%）がドキュメンテーションを公開している。
安全ポリシーと評価に関する公的情報は限られている：安全ポリシーを開示しているのは19.4%（13/67）、外部安全評価を報告しているのは7.5%（5/67）、公開可能な安全評価情報を持つのは9%（6/67）。
安全関連の開示の大半は、Anthropic、Google DeepMind、OpenAI などの大企業数社から発生している。

Figure 2: Only 19.4% of indexed agentic systems disclose a formal safety policy, and fewer than 10% report external safety evaluations.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。