QUICK REVIEW

[论文解读] Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

Jingru Fan, Dewen Liu|arXiv (Cornell University)|Feb 5, 2026

Topic Modeling被引用 0

一句话总结

论文主张为基于协作增益度量（Gamma）和因素归因范式的多智能体系统（MAS）的科学框架，并通过结构化的MAS因素库，将设计从盲目试错转向严格的设计科学。

ABSTRACT

Recent advancements in Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS), demonstrating significant effectiveness across a wide range of complex and open-ended domains. However, despite this rapid progress, the field still relies heavily on empirical trial-and-error. It lacks a unified and principled scientific framework necessary for systematic optimization and improvement. This bottleneck stems from the ambiguity of attribution: first, the absence of a structured taxonomy of factors leaves researchers restricted to unguided adjustments; second, the lack of a unified metric fails to distinguish genuine collaboration gain from mere resource accumulation. In this paper, we advocate for a transition to design science through an integrated framework. We advocate to establish the collaboration gain metric ($Γ$) as the scientific standard to isolate intrinsic gains from increased budgets. Leveraging $Γ$, we propose a factor attribution paradigm to systematically identify collaboration-driving factors. To support this, we construct a systematic MAS factor library, structuring the design space into control-level presets and information-level dynamics. Ultimately, this framework facilitates the transition from blind experimentation to rigorous science, paving the way towards a true science of Collective AI.

研究动机与目标

识别MAS设计中的归因不确定性以及需要原则性指导的问题。
引入协作增益度量（Gamma）以将真正的协作与资源扩展分离。
建立因素归因范式，系统性识别推动协作的因素。
构建系统化的MAS因素库，将设计空间分为控制层和信息层的内部结构。
提出将MAS构建从经验性实践转向科学方法论的路径。

提出的方法

将Gamma定义为在等效资源条件下MAS相对于单智能体系统的性能比，以从资源效应中分离协作增益。
提出两步因素归因过程：(a) 测试某一因素是否带来改进，(b) 使用Gamma验证真正的协作增益（Gamma>1）。
创建一个结构化的MAS因素库，将因素分为任务 contexto（外部）和MAS构造（内部），并在内部进一步区分为控制层和信息层。
描述用于将增益归因于因素的度量与诊断程序，包括稳定性过滤以确保对Gamma>1结论的鲁棒性。
将该框架组织为一个分类法和系统实验与评估的指南。

实验结果

研究问题

RQ1如何定义和操作化Gamma以将真正的协作与资源扩展分离？
RQ2使用Gamma将MAS性能提升归因到特定设计因素的过程是什么？
RQ3结构化的MAS因素库如何促进原理化设计和对因素的系统性研究？
RQ4构建从盲目试错转向科学探究的鲁棒、可重复工作流的要素有哪些？

主要发现

Gamma被定义为在相同资源预算条件下，MAS性能相对于单智能体基线的比值，Gamma>1表示真正的协作增益。
二元归因框架将因素分为Class I（正向，Gamma>1）和Class II（负向，Gamma≤1），用于筛选无效设计。
两阶段因素归因过程在固定预算下进行预条件实验，然后通过Gamma与稳定性过滤进行因子验证。
提出MAS因素库，将外部任务情境与内部MAS构造分离，并将内部因素进一步分为控制（静态预设）与信息（动态机制）。
引入信息层指标（内容熵与进化距离）以跟踪动态协作过程，并将有意义的收敛与噪声区分开来。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。