QUICK REVIEW

[論文レビュー] Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems

Xi Shi, Mengxin Zheng|arXiv (Cornell University)|Jan 15, 2026

AI-based Problem Solving and Planning被引用数 2

ひとこと要約

LAMaSは並列多エージェントシステムの遅延意識あるオーケストレーションを提案し、GSM8K、HumanEval、MATHのベンチマークでクリティカル実行経路長を最大約46%短縮しつつタスク性能を競争力のある水準で維持/向上させる。

ABSTRACT

Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under parallel execution. Our experiments show that our approach reduces critical path length by 38-46% compared to the state-of-the-art baseline for multi-agent architecture search across multiple benchmarks, while maintaining or even improving task performance. These results highlight the importance of explicitly optimizing latency under parallel execution when designing efficient multi-agent systems. The code is available at https://github.com/xishi404/LAMaS

研究の動機と目的

並列実行下での精度とコストに焦点を当てたMASオーケストレーションの制限を特定する。
遅延意識フレームワーク（LAMaS）を提案し、クリティカルな実行経路を最適化する。
層ごとに依存関係を削除して層次的並列実行を可能にする。
遅延を用いた報酬を導入し、ボトルネック演算子のみを更新する確率的スーパーネットを介して実行トポロジーを学習する。

提案手法

MAS探索空間をエージェント的スーパーネット（確率的有向非巡回グラフ）としてモデル化する。
不要な層内依存関係を取り除くことで層次的並列実行を可能にする。
閾値ベースの、クエリ対応型コントローラを用いて各層の並列演算子サブセットをサンプリングする。
遅延を、層ごとの最大演算子遅延の総和（クリティカルパス）として定義する。
クリティカルパスのクレジット割り当てを行う遅延意識報酬を導入し、ボトルネック演算子のみを更新する。
報酬を政策勾配法で学習し、EMAで正規化して学習を安定化させる。

Figure 1: (Left): Building blocks for LAMaS; (Right): Workflow illustration of LAMaS. The orchestrator generates a layer-wise execution graph, where operators within the same layer execute in parallel. Red arrows indicate the critical execution path.

実験結果

リサーチクエスチョン

RQ1遅延意識監督が、精度を犠牲にせずに並列MAS実行下でクリティカル実行経路を短くすることができるか？
RQ2クリティカルパスの明示的最適化は、遅延・コスト・タスク性能の点でベースラインMaASとどう比較されるか？
RQ3層内並列化と遅延意識クレジット割り当てを有効にすることで、固定トポロジーのベースラインより遅延効率が改善されるか？

主な発見

LAMaSはGSM8Kで平均クリティカルパス長(CP len)を38.0%、HumanEvalで42.4%、MATHで46.1%短縮した（MaASと比較）。
GSM8Kでは、LAMaSはCP len 913.5でスコア93.37を達成、MaASはスコア93.13、CP len 1474.6。
HumanEvalでは、LAMaSはCP len 1042.7でスコア92.11を達成、MaASはスコア93.00、CP len 1810.8。
MATHでは、LAMaSはCP len 1195.8でスコア52.26を達成、MaASはスコア51.23、CP len 2218.5。
LAMaSはタスク性能と遅延削減を両立させることが多く、コスト管理にも寄与する。
アブレーションでは遅延最適化（lambda_t = 0）を削除するとCP lenが長くなり、ベンチマーク間で精度が悪化したりコストが増加することがある。

Figure 2: Accuracy–latency trade-off on HumanEval. Marker size indicates average cost. Blue points correspond to LAMaS under different latency penalty coefficient $\lambda_{t}$

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。