QUICK REVIEW

[論文レビュー] Cognitive Architectures for Language Agents

Theodore R. Sumers, Shunyu Yao|arXiv (Cornell University)|Sep 5, 2023

Topic Modeling被引用数 56

ひとこと要約

CoALA は、memory, action space, および decision-making によって言語エージェントを整理し、LLM-based エージェントの開発を統一・導くためのフレームワークを提示します。

ABSTRACT

Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning, leading to a new class of language agents. While these agents have achieved substantial empirical success, we lack a systematic framework to organize existing agents and plan future developments. In this paper, we draw on the rich history of cognitive science and symbolic artificial intelligence to propose Cognitive Architectures for Language Agents (CoALA). CoALA describes a language agent with modular memory components, a structured action space to interact with internal memory and external environments, and a generalized decision-making process to choose actions. We use CoALA to retrospectively survey and organize a large body of recent work, and prospectively identify actionable directions towards more capable agents. Taken together, CoALA contextualizes today's language agents within the broader history of AI and outlines a path towards language-based general intelligence.

研究の動機と目的

多様な言語エージェントを比較するための統一フレームワークの必要性を動機づける。
言語エージェントにおける memory, actions, and decision-making を整理する認知アーキテクチャに着想を得た設計図（CoALA）を紹介する。
CoALA が既存のエージェントを表現できることを示し、より有能なエージェントの将来の方向性を強調する。

提案手法

language agents を three CoALA dimensions にマッピングする: memory (working and long-term)、action space (external and internal)、そして decision-making (planning and execution)。
memory のサブタイプを episodic、semantic、procedural、and working memory に定義し、エージェントの推論と学習におけるそれらの役割を説明する。
grounding を、物理的・人間的・デジタル環境と相互作用する外部アクションとして説明し、内部アクション（retrieval、reasoning、learning）を分類する。
推論と retrieval がアクションを計画し、最適な grounding または learning アクションを選択し、それを実行する反復的な意思決定サイクルを概説する。
LLMs を、テキストベースの内部表現と柔軟な推論を可能にする、より広い認知アーキテクチャに触発されたシステムの中核コンポーネントとして位置づける。

Figure 1: Different uses of large language models (LLMs). A : In natural language processing (NLP), an LLM takes text as input and outputs text. B : Language agents (Ahn et al., 2022 ; Huang et al., 2022c ) place the LLM in a direct feedback loop with the external environment by transforming observa

実験結果

リサーチクエスチョン

RQ1認知アーキテクチャのフレームワークは、言語エージェントに関する多様な研究をどのように整理できるだろうか？
RQ2語言-agent の holistic intelligence を安定して支える核となる memory、action、そして decision-making の構成要素とは何か？
RQ3CoALA は既存の言語エージェントをどのように表現し、将来のエージェントの開発をどのように導くことができるか？
RQ4より有能な言語エージェントを構築するために、CoALA はどのような実行可能な手順を提案しているか？

主な発見

CoALA は、memory、action、そして decision-making の3次元構造を提供し、既存の広範な成果を表現できる。
エージェントは、相互作用する memory モジュール、grounding インターフェース、そして計画と実行を行うループ型の意思決定手続として理解できる。
このフレームワークは、production-system の概念と現代の LLM-based エージェントを結びつけ、内部推論、retrieval、learning がどのようにエージェントの挙動を構成するかを明確化する。
LLMs および grounding を通じた grounding と memory の管理は、言語エージェントのより柔軟なエンドツーエンド設計を可能にする。
本調査は未踏の方向性と、言語エージェントの能力をより広い知能に向けて進める実用的な手順を特定する。

Figure 2: Cognitive architectures augment a production system with sensory groundings, long-term memory, and a decision procedure for selecting actions. A : The Soar architecture, reproduced with permission from Laird ( 2022 ) . B : Soar’s decision procedure uses productions to select and implement

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。