Skip to main content
QUICK REVIEW

[Paper Review] Large Language Model (LLM)-enabled Reinforcement Learning for Wireless Network Optimization

Jie Zheng, Ruichen Zhang|arXiv (Cornell University)|Jan 15, 2026
Software-Defined Networks and 5G0 citations
TL;DR

The paper surveys and develops a framework for integrating LLMs with reinforcement learning to optimize 6G wireless networks, and demonstrates a novel LLM-enabled MARL framework for service migration and graph generation in UAV–satellite networks.

ABSTRACT

Enhancing future wireless networks presents a significant challenge for networking systems due to diverse user demands and the emergence of 6G technology. While reinforcement learning (RL) is a powerful framework, it often encounters difficulties with high-dimensional state spaces and complex environments, leading to substantial computational demands, distributed intelligence, and potentially inconsistent outcomes. Large language models (LLMs), with their extensive pretrained knowledge and advanced reasoning capabilities, offer promising tools to enhance RL in optimizing 6G wireless networks. We explore RL models augmented by LLMs, emphasizing their roles and the potential benefits of their synergy in wireless network optimization. We then examine LLM-enabled RL across various protocol layers: physical, data link, network, transport, and application layers. Additionally, we propose an LLM-assisted state representation and semantic extraction to enhance the multi-agent reinforcement learning (MARL) framework. This approach is applied to service migration and request routing, as well as topology graph generation in unmanned aerial vehicle (UAV)-satellite networks. Through case studies, we demonstrate that our framework effectively performs optimization of wireless network. Finally, we outline prospective research directions for LLM-enabled RL in wireless network optimization.

Motivation & Objective

  • Assess how LLMs can augment RL for wireless network optimization across protocol layers.
  • Propose a systematic framework for integrating LLMs into the RL agent–environment paradigm (feature extractor, reward designer, policy interpreter, decision-maker).
  • Develop and validate an LLM-enabled multi-agent RL framework for service migration and request routing in UAV–satellite networks.
  • Identify cross-layer design challenges and outline future research directions for LLM-enabled RL in wireless networks.

Proposed method

  • Classify LLM roles within RL as feature extractor, reward designer, policy interpreter, and decision-maker.
  • Propose an end-to-end LLM-enabled MARL framework for UAV–satellite service migration using LESR (LLM-enabled state representation) and semantic extraction.
  • Use prompt templates, graph-based state representations, intrinsic rewards, and feedback loops to guide MARL in dynamic networks.
  • Evaluate the framework in a simulated LEO satellite network with a GNN-DQN agent setup and compare against baselines like greedy shortest-path and non-LLM models.

Experimental results

Research questions

  • RQ1What kind of LLM-enabled RL paradigm can offer to wireless optimization design?
  • RQ2How can LLM-enabled RL be applied effectively to wireless network optimization across protocol layers?
  • RQ3How can LLMs assist in state representation, reward design, and decision-making to improve learning efficiency and performance?

Key findings

  • LLMs can enhance RL in wireless networks when used as feature extractors, reward designers, policy interpreters, and decision-makers.
  • LLMs-enabled MARL for UAV–satellite networks achieves about a 25% improvement in service migration decision-making performance over baselines.
  • In physical-layer contexts, LLMs help interpret channel dynamics and interference for better beamforming and power control.
  • In data link, network, transport, and application layers, LLMs improve reward shaping, topology generation, and task scheduling efficiency across scenarios.
  • The proposed LESR-based MARL framework with semantic extraction converges faster and yields higher average rewards compared to reward-design-only MARL and non-LLM recurrent models.
  • A future-oriented discussion highlights robustness, security, world-model integration, federated learning, and low-overhead LLM techniques.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.