QUICK REVIEW

[논문 리뷰] HieraMAS: Optimizing Intra-Node LLM Mixtures and Inter-Node Topology for Multi-Agent Systems

Tianjun Yao, Zhaoyi Li|arXiv (Cornell University)|2026. 02. 23.

Advanced Graph Neural Networks인용 수 0

한 줄 요약

HieraMAS는 노드 내 LLM 혼합물을 포함하는 초노드(supernodes)를 갖춘 계층적 MAS를 제안하고, 그래프 분류를 통해 노드 간 토폴로지를 최적화하여 최첨단 성능과 더 높은 비용 효율성을 달성합니다. 이는 노드별 크레딧 할당을 위한 다단계 보상을 사용하고 토폴로지를 전체 그래프 점수로 다룹니다.

ABSTRACT

Multi-agent systems (MAS) built on large language models (LLMs) have shown strong performance across many tasks. Most existing approaches improve only one aspect at a time, such as the communication topology, role assignment, or LLM routing, while treating each agent as a single, indivisible unit. This misses the opportunity to use mixtures of LLMs within an agent to strengthen role-specific abilities. We propose HieraMAS, a hierarchical collaboration framework that combines intra-node LLM mixtures with an inter-node communication topology. HieraMAS introduces supernodes, where each functional role is implemented by multiple heterogeneous LLMs using a propose-synthesis structure. Optimizing HieraMAS creates unique credit-assignment challenges: final task performance depends heavily on the underlying LLMs' capabilities, which can lead reinforcement methods to incorrectly reward suboptimal configurations. To address this, we use a two-stage algorithm: (1) multi-level reward attribution, which provides fine-grained feedback at both the node level and the overall system level; (2) graph classification for topology selection, which treats choosing the communication structure as a holistic decision rather than optimizing edges one by one. Experiments on reasoning and coding benchmarks show that HieraMAS substantially outperforms existing methods while also delivering better cost-performance trade-offs.

연구 동기 및 목표

초노드 내 LLM 혼합물을 활용하여 MAS 성능을 향상시키는 동기를 부여한다.
노드 내 구성(LLM 선택 및 역할 유지)과 노드 간 통신 토폴로지의 공동 최적화를 달성한다.
다단계 보상과 전체 그래프 기반 토폴로지 선택으로 크레딧 할당 문제를 해결한다.
추론 및 코딩 벤치마크에서 우수한 성능과 비용 효율성을 시연한다.

제안 방법

제안-합성 구조를 갖춘 LLM 혼합물로 초노드를 도입한다.
상태, 행동(역할 선택, LLM 선택, 엣지 선택) 및 정확도와 비용의 균형을 맞추는 보상으로 MAS를 MDP로 공식화한다.
1단계: 무작위 그래프 토폴로지로 다단계 보상을 사용하여 노드 내 LLM 선택을 학습한다.
2단계: 후보 DAG 풀에서 노드 간 토폴로지를 선택하도록 그래프 분류기를 학습시키고 엣지 수준의 크레딧 할당을 피한다.
비용 민감 보상 함수를 사용하여 더 저렴하지만 올바른 해를 장려한다.
두 단계 접근 방식과 그 크레딧 할당 이점에 대한 이론적 정당화를 제시한다.

Figure 1 : Illustration of two credit assignment challenges in joint optimization and our solutions. Challenge 1 : Final task rewards mask individual node errors—Node 2 produces incorrect output but receives high reward $R_{2}=0.92$ . HieraMAS addresses this via multi-level rewards that provide effe

실험 결과

연구 질문

RQ1초노드 내의 노드 간 LLM 혼합물이 개별 역할 능력과 전체 MAS 성능을 개선할 수 있는가?
RQ2전체 그래프 분류를 통한 노드 간 토폴로지 학습이 엣지 수준 최적화보다 우수한가?
RQ3다단계 보상이 공동 최적화 설정에서 각 역할에 대한 올바른 크레딧 할당을 가능하게 하는가?
RQ4LLM 선택, 역할 유지, 토폴로지의 공동 최적화 시 비용-성능의 트레이드오프는 무엇인가?

주요 결과

방법	다중	토폴로지	역할	노드	GPT-5-Mini	Qwen3-80B	GPT-5-Mini	Qwen3-80B	GPT-5-Mini	Qwen3-80B	평균
Base	✗	✗	✗	✗	89.06	84.14	77.78	74.44	92.00	82.40	83.14
CoT	✗	✗	✗	✗	87.50	85.94	92.22	90.00	93.60	89.60	89.81
Self-Consistency	✓	✗	✗	✗	89.06	87.50	93.33	91.11	94.40	83.20	89.77
Self-Consistency+CoT	✓	✗	✗	✗	90.62	85.94	94.44	92.22	93.60	92.80	91.60
LLM-Debate	✓	✗	✗	✗	87.50	87.50	94.44	94.44	92.80	92.00	91.45
Full-Graph	✓	✗	✗	✗	89.06	92.19	95.56	96.67	94.40	88.80	92.78
Random-Graph	✓	✗	✗	✗	85.94	92.19	93.33	94.44	91.20	88.00	90.85
AFlow	✓	✓	✗	✗	95.31	98.44	95.56	84.44	91.20	91.20	92.69
GDesigner	✓	✓	✗	✗	90.62	93.75	91.11	87.77	92.00	88.80	90.68
MASRouter	✓	✓	✓	✗	96.88	98.44	91.11	88.88	88.33	81.67	90.89
Ours	✓	✓	✓	✓	93.75	96.88	96.67	95.56	95.20	89.60	94.61

HieraMAS는 세 가지 벤치마크에서 평균 최첨단 정확도(94.61%)를 달성한다.
전체 그래프(완전 연결)는 높은 정확도를 보이나 HieraMAS보다 비용이 훨씬 높다.
약화 실험은 그래프 토폴로지 점수 매기기와 LLM 선택이 성능과 비용 효율성에 모두 기여함을 보여준다.
학습된 토폴로지는 희소하고 불규칙하며 상위 그래프들 간에 일관된 싱크/소스 역할이 나타난다.
노드 내 LLM 혼합물은 능력과 비용의 균형을 이루어 균일한 최강 모델 구성보다 우수하다.
우리 방법은 HumanEval++, MATH, 및 MMLU-Redux에서 기준선보다 지속적으로 우수하다.

Figure 2 : The overall framework of HieraMAS . By optimizing a policy learner $\pi_{m}$ with multi-level rewards (Stage 1) and a graph classifier $f_{G}(\cdot)$ with contrastive rewards (Stage 2), HieraMAS learns to select optimal supernode configurations and communication topologies. During inferen

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.