QUICK REVIEW

[论文解读] DOVA: Deliberation-First Multi-Agent Orchestration for Autonomous Research Automation

Aaron Shen, Alfred Shen|arXiv (Cornell University)|Mar 4, 2026

Topic Modeling被引用 0

一句话总结

DOVA 引入了面向审议的优先的混合多代理编排，用于自主研究任务，结合三种协作模式和自适应思维，以提升来源覆盖、信心和令牌效率，同时减少不必要的工具调用。

ABSTRACT

Large language model (LLM) agents have demonstrated remarkable capabilities in tool use, reasoning, and code generation, yet single-agent systems exhibit fundamental limitations when confronted with complex research tasks demanding multi-source synthesis, adversarial verification, and personalized delivery. We present DOVA (Deep Orchestrated Versatile Agent), a multi-agent platform introducing three key innovations: (1) deliberation-first orchestration, where explicit meta-reasoning precedes tool invocation, informed by a persistent user model and entity-aware conversation context; (2) hybrid collaborative reasoning, a composable three-phase pipeline unifying ensemble diversity, blackboard transparency, and iterative refinement; and (3) adaptive multi-tiered thinking, a six-level token-budget allocation scheme that reduces inference cost by 40-60% on simple tasks while preserving deep reasoning capacity. We formalize the core algorithms, present an architectural ablation study across seven system configurations, and analyze the contribution of each component to answer confidence, source coverage, and token efficiency.

研究动机与目标

在需要多源综合与验证的复杂研究任务中，动机并解决单代理LLM系统的局限性。
提出一个多代理平台，使用以审议为先的编排来决定何时调用工具。
开发一个混合协作推理流程，平衡广度、透明度和深度。
引入自适应的多层次思维以节省令牌，同时保持推理质量。
通过消融研究展示架构并衡量对信心、来源覆盖和效率的影响。

提出的方法

以审议为先的编排，使用持续的用户模型和实体感知上下文，在任何行动前决定工具调用。
混合协作推理，结合集成、黑板和迭代改进阶段。
自适应六级思维预算，将任务类型与复杂度映射到令牌预算。
在分层记忆架构中使用MMR进行多样性感知的记忆检索。
统一的多模态界面，暴露REST、CLI、浏览器UI，以及通过动态插件实现的Claude Code集成的MCP服务器。
针对评价性查询的多轮对抗性辩论（Bull-vs-Bear），进行结构化综合。

Figure 1 : Layered architecture of Dova . Queries enter through the Interface Layer, pass through Orchestration (with deliberation), dispatch to specialized agents, which leverage collaborative reasoning and intelligence services.

实验结果

研究问题

RQ1与直觉式工具调用相比，审议优先编排如何影响工具使用和响应质量？
RQ2混合协作推理在不同复杂度任务中对信心、来源覆盖和令牌效率的影响如何？
RQ3自适应的多层次思维能否在简单与复杂研究任务中降低令牌使用而不影响答案质量？
RQ4记忆多样性和对抗性辩论对综合输出的可靠性与验证有何影响？

主要发现

混合协作推理带来最大的性能提升；在消融研究中去除协作会使信心降低0.14、覆盖率降低0.25。
自适应思维在简单任务（如分类和摘要）上显著节省令牌，且对信心的影响最小。
审议能减少不必要的工具调用并降低延迟，从而提升成本效率。
基于ReAct的一次性推理在信心方面明显不如完整的DOVA流程。
自我评估和记忆情境提升了细化率和评估准确性，缓解了产出低质量的问题。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。