QUICK REVIEW

[논문 리뷰] Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems

Xi Shi, Mengxin Zheng|arXiv (Cornell University)|2026. 01. 15.

AI-based Problem Solving and Planning인용 수 2

한 줄 요약

LAMaS는 병렬 다중 에이전트 오케스트레이션에서 지연 인식 학습을 도입하여 중요한 실행 경로를 최대 약 ~46%까지 감소시키면서 작업 성능을 유지하거나 향상시킨다.

ABSTRACT

Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under parallel execution. Our experiments show that our approach reduces critical path length by 38-46% compared to the state-of-the-art baseline for multi-agent architecture search across multiple benchmarks, while maintaining or even improving task performance. These results highlight the importance of explicitly optimizing latency under parallel execution when designing efficient multi-agent systems. The code is available at https://github.com/xishi404/LAMaS

연구 동기 및 목표

병렬 실행하에서 정확도- 및 비용에 중점을 둔 MAS 오케스트레이션의 한계를 식별한다.
지연 인식 프레임워크(LAMaS)를 제안하여 임계 실행 경로를 최적화한다.
레이어 내 의존성을 제거하여 레이어 단위의 병렬 실행을 가능하게 한다.
지연 가이드 보상과 함께 확률적 슈퍼넷을 통해 실행 토폴로지를 학습한다.

제안 방법

MAS 검색 공간을 에이전트적 슈퍼넷(확률적 DAG)으로 모델링한다.
불필요한 레이어 내 의존성을 제거하여 레이어 단위 병렬 실행을 가능하게 한다.
임계값 기반의 쿼리 인식 컨트롤러를 사용하여 각 레이어에서 병렬 연산자 부분집합을 샘플링한다.
지연을 레이어별 최대 연산자 지연의 합으로 정의한다(임계 경로).
병목 연산자만 업데이트하는 임계 경로 크레딧 할당이 포함된 지연 인식 보상을 도입한다.
정책 기울기로 학습하고 보상을 EMA를 통해 정규화하여 학습을 안정화한다.

Figure 1: (Left): Building blocks for LAMaS; (Right): Workflow illustration of LAMaS. The orchestrator generates a layer-wise execution graph, where operators within the same layer execute in parallel. Red arrows indicate the critical execution path.

실험 결과

연구 질문

RQ1지연 인식 감독이 병렬 MAS 실행에서 정확도를 희생하지 않으면서 임계 실행 경로를 단축할 수 있는가?
RQ2임계 경로의 명시적 최적화가 지연, 비용 및 작업 성능 면에서 기본 MaAS와 어떻게 비교되는가?
RQ3레이어 내 병렬성과 지연 인식 크레딧 할당을 가능하게 하는 것이 고정 토폴로지 기준선 대비 지연 효율성을 향상시키는가?

주요 결과

LAMaS는 MaAS와 비교하여 GSM8K에서 평균 임계 경로 길이(CP len)를 38.0%, HumanEval에서 42.4%, MATH에서 46.1% 감소시킨다.
GSM8K에서 LAMaS는 CP len 913.5로 93.37 점수를 달성하고 MaAS의 93.13 점수 및 CP len 1474.6 대비이다.
HumanEval에서 LAMaS는 CP len 1042.7로 92.11 점수를 달성하고 MaAS의 93.00 점수 및 CP len 1810.8 대비이다.
MATH에서 LAMaS는 CP len 1195.8로 52.26 점수를 달성하고 MaAS의 51.23 점수 및 CP len 2218.5 대비이다.
LAMaS는 종종 작업 성능을 유지하거나 초과하면서 CP len을 크게 감소시키고 비용을 관리한다.
삭제 실험은 지연 최적화 제거(lambda_t = 0)가 CP len을 더 길게 만들고 벤치마크 전반에서 때로는 정확도가 더 낮아지거나 비용이 더 높아지는 것을 보여준다.

Figure 2: Accuracy–latency trade-off on HumanEval. Marker size indicates average cost. Blue points correspond to LAMaS under different latency penalty coefficient $\lambda_{t}$

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.