QUICK REVIEW

[论文解读] Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems

Xi Shi, Mengxin Zheng|arXiv (Cornell University)|Jan 15, 2026

AI-based Problem Solving and Planning被引用 2

一句话总结

LAMaS 引入对延迟敏感的并行多主体编排学习，在不降低或甚至提高任务性能的前提下，将关键执行路径缩短最多约46%。

ABSTRACT

Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under parallel execution. Our experiments show that our approach reduces critical path length by 38-46% compared to the state-of-the-art baseline for multi-agent architecture search across multiple benchmarks, while maintaining or even improving task performance. These results highlight the importance of explicitly optimizing latency under parallel execution when designing efficient multi-agent systems. The code is available at https://github.com/xishi404/LAMaS

研究动机与目标

在并行执行下识别以准确性和成本为导向的 MAS 编排的局限性。
提出一个延迟感知框架（LAMaS），优化关键执行路径。
通过消除层内依赖实现逐层并行执行。
通过带延迟引导奖励的概率性超网学习执行拓扑结构。

提出的方法

将 MAS 搜索空间建模为智能体超网（概率 DAG）。
通过消除不必要的层内依赖实现逐层并行执行。
使用基于阈值的、对查询敏感的控制器对每层采样并行算子子集。
将延迟定义为各层的最大层内算子延迟之和（关键路径）。
引入带关键路径信用分配的延迟感知奖励，仅更新瓶颈算子。
使用策略梯度训练，通过 EMA 对奖励进行归一化以稳定学习。

Figure 1: (Left): Building blocks for LAMaS; (Right): Workflow illustration of LAMaS. The orchestrator generates a layer-wise execution graph, where operators within the same layer execute in parallel. Red arrows indicate the critical execution path.

实验结果

研究问题

RQ1在并行 MAS 执行下，延迟感知监督是否能在不损失准确性的前提下缩短关键执行路径？
RQ2与基线 MaAS 相比，显式优化关键路径在延迟、成本和任务性能方面有何差异？
RQ3启用层内并行性加上延迟感知信用分配是否在延迟效率上优于固定拓扑基线？

主要发现

LAMaS 将平均关键路径长度 CP len 在 GSM8K 上缩短 38.0%，在 HumanEval 上缩短 42.4%，在 MATH 上缩短 46.1%，相较于 MaAS。
在 GSM8K 上，LAMaS 以 CP len 913.5 获得分数 93.37，而 MaAS 分数 93.13、CP len 1474.6。
在 HumanEval 上，LAMaS 以 CP len 1042.7 获得分数 92.11，而 MaAS 分数 93.00、CP len 1810.8。
在 MATH 上，LAMaS 以 CP len 1195.8 获得分数 52.26，而 MaAS 分数 51.23、CP len 2218.5。
LAMaS 经常在显著降低 CP len 的同时达到或超过任务性能，并有效控制成本。
消融实验显示移除延迟优化（lambda_t = 0）会导致 CP len 增长，在不同基准上有时会带来准确性下降或成本上升。

Figure 2: Accuracy–latency trade-off on HumanEval. Marker size indicates average cost. Blue points correspond to LAMaS under different latency penalty coefficient $\lambda_{t}$

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。