QUICK REVIEW

[논문 리뷰] Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

Matteo Salvatori, Filippo Vannella|arXiv (Cornell University)|2026. 03. 25.

Advanced MIMO Systems Optimization인용 수 0

한 줄 요약

이 논문은 CIO 기반 핸오버 최적화를 이중 그래프에서 분산형 다중 에이전트 강화학습 문제로 공식화하고, 공유 GNN 액터와 지역별 크리틱을 갖춘 이산 TD3-D-MA 방법을 제시하여 확장 가능한 학습 및 강건한 일반화를 달성한다.

ABSTRACT

HandOver (HO) control in cellular networks is governed by a set of HO control parameters that are traditionally configured through rule-based heuristics. A key parameter for HO optimization is the Cell Individual Offset (CIO), defined for each pair of neighboring cells and used to bias HO triggering decisions. At network scale, tuning CIOs becomes a tightly coupled problem: small changes can redirect mobility flows across multiple neighbors, and static rules often degrade under non-stationary traffic and mobility. We exploit the pairwise structure of CIOs by formulating HO optimization as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) on the network's dual graph. In this representation, each agent controls a neighbor-pair CIO and observes Key Performance Indicators (KPIs) aggregated over its local dual-graph neighborhood, enabling scalable decentralized decisions while preserving graph locality. Building on this formulation, we propose TD3-D-MA, a discrete Multi-Agent Reinforcement Learning (MARL) variant of the TD3 algorithm with a shared-parameter Graph Neural Network (GNN) actor operating on the dual graph and region-wise double critics for training, improving credit assignment in dense deployments. We evaluate TD3-D-MA in an ns-3 system-level simulator configured with real-world network operator parameters across heterogeneous traffic regimes and network topologies. Results show that TD3-D-MA improves network throughput over standard HO heuristics and centralized RL baselines, and generalizes robustly under topology and traffic shifts.

연구 동기 및 목표

밀집하고 이질적인 네트워크에서 적응형 핸오버(HO) 제어의 필요성에 대한 동기 부여.
네트워크 이중 그래프에서 CIO 기반 HO 조정을 협력적 Dec-POMDP로 공식화.
공유 매개변수 GNN 액터와 지역별 크리틱으로 신용 할당을 가능하게 하는 확장 가능한 MARL 알고리즘 개발.
토폴로지 및 트래픽 변화 하에서 제안 방법의 강건성과 일반화 능력 시연.
현실 세계 운영자 매개변수를 반영한 ns-3 기반 CIO 중심 HO 평가 환경 제공

제안 방법

이중 그래프의 에지 기반 에이전트로 CIO를 모델링하여 지역적으로 연결된 효과를 포착합니다.
공동 학습-분산 실행(CTDE) 패러다임을 사용하고 이중 그래프에서 작동하는 공유 GNN 액터를 적용합니다.
이산 CIO 행동에 대한 미분 가능 Relaxation을 갖춘 이산 TD3-D-MA를 도입합니다.
중첩된 프리마 서브네트워크에서 학습된 지역별 더블 크리틱으로 신용 할당을 개선합니다.
이중 그래프 M-hop 이웃에서의 지역 관측과 셀 처리량 기반의 글로벌 팀 보상을 정의합니다.
다양한 토폴로지에서 실제 운영자 매개변수를 활용한 ns-3 시스템 수준 시뮬레이터로 평가합니다.

Figure 1 : Dual-graph MARL framework for CIO-based HO. CIO agents are placed on dual-graph nodes corresponding to inter-cell edges $e=\{i,j\}$ and tune the HO bias $\mathrm{CIO}_{ij}$ for each neighbor pair. A distributed GNN actor performs local message passing to produce edge actions, while CTDE t

실험 결과

연구 질문

RQ1CIO 기반 HO 제어가 이중 그래프에서 협력적 MARL 문제로 효과적으로 모델링될 수 있는가?
RQ2공유 GNN 액터와 지역별 크리틱을 갖춘 TD3-D-MA 프레임워크가 베이스라인에 비해 학습 안정성 및 확장성을 개선하는가?
RQ3제안 접근법이 토폴로지 및 트래픽 변동에 얼마나 잘 일반화되는가?
RQ4이중 그래프 로컬리티와 CTDE가 신용 할당 및 성능에 미치는 영향은 무엇인가?
RQ5제안된 ns-3 기반 CIO 중심 환경이 재현 가능한 HO/MLB 실험에 적합한가?

주요 결과

이중 그래프 GNN 액터와 지역별 크리틱으로 된 TD3-D-MA가 휴리스틱 HO 및 중앙 집중식 RL 베이스라인보다 처리량을 개선한다.
에지 기반 이중 그래프 형태가 이웃 CIO 간의 국지적 결합을 노드 기반 접근법보다 더 효과적으로 포착한다.
CTDE와 지역별 크리틱이 밀집 배치에서의 신용 할당을 향상시킨다.
토폴로지 및 이동 패턴 변화하에서도 방법이 강건하게 일반화된다.
현실 운영자 매개변수를 갖춘 ns-3 기반 CIO 중심 환경이 재현 가능한 평가를 지원한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.