QUICK REVIEW

[论文解读] Large Language Model (LLM)-enabled Reinforcement Learning for Wireless Network Optimization

Jie Zheng, Ruichen Zhang|arXiv (Cornell University)|Jan 15, 2026

Software-Defined Networks and 5G被引用 0

一句话总结

本论文综述并建立一个将大语言模型（LLMs）与强化学习（RL）相结合，以优化6G无线网络的框架，并展示了一种新颖的LLM驱动的多智能体强化学习框架，用于无人机–卫星网络中的服务迁移与图生成。

ABSTRACT

Enhancing future wireless networks presents a significant challenge for networking systems due to diverse user demands and the emergence of 6G technology. While reinforcement learning (RL) is a powerful framework, it often encounters difficulties with high-dimensional state spaces and complex environments, leading to substantial computational demands, distributed intelligence, and potentially inconsistent outcomes. Large language models (LLMs), with their extensive pretrained knowledge and advanced reasoning capabilities, offer promising tools to enhance RL in optimizing 6G wireless networks. We explore RL models augmented by LLMs, emphasizing their roles and the potential benefits of their synergy in wireless network optimization. We then examine LLM-enabled RL across various protocol layers: physical, data link, network, transport, and application layers. Additionally, we propose an LLM-assisted state representation and semantic extraction to enhance the multi-agent reinforcement learning (MARL) framework. This approach is applied to service migration and request routing, as well as topology graph generation in unmanned aerial vehicle (UAV)-satellite networks. Through case studies, we demonstrate that our framework effectively performs optimization of wireless network. Finally, we outline prospective research directions for LLM-enabled RL in wireless network optimization.

研究动机与目标

评估LLMs如何在无线网络优化的各协议层中增强RL的作用。
提出将LLMs整合到RL智能体–环境范式中的系统框架（特征提取、奖励设计、策略解释、决策执行）。
开发并验证一个基于LLM的多智能体RL框架，用于无人机–卫星网络中的服务迁移与请求路由。
识别跨层设计挑战并勾勒LLM驱动的无线网络RL未来研究方向。

提出的方法

将LLM在RL中的角色分类为特征提取、奖励设计、策略解释和决策执行。
提出一个端到端的LLM驱动MARL框架，用于基于LESR（LLM-enabled state representation）和语义提取的无人机–卫星服务迁移。
使用提示模板、基于图的状态表示、内在奖励和反馈回路来引导MARL在动态网络中的学习。
在一个模拟的LEO卫星网络中使用GNN-DQN智能体设置对框架进行评估，并与贪心最短路径等基线以及非LLM模型进行比较。

实验结果

研究问题

RQ1LLM驱动的RL范式可以为无线优化设计提供什么样的支持？
RQ2LLM驱动的RL如何在协议层面有效应用于无线网络优化？
RQ3LLMs如何在状态表示、奖励设计和决策制定中提供帮助，以提高学习效率和性能？

主要发现

LLMs在作为特征提取、奖励设计、策略解释和决策执行的角色时，可以提升无线网络中的RL性能。
用于无人机–卫星网络的LLM驱动MARL在服务迁移决策性能上相对于基线提升约25%。
在物理层上下文中，LLMs有助于解释信道动态与干扰，从而实现更好的波束成形与功率控制。
在数据链路、网络、传输和应用层，LLMs在奖励塑造、拓扑生成和任务调度效率方面均有改善，且在多种场景中表现良好。
以LESR为基础、具语义提取的MARL框架的收敛速度更快、平均奖励更高，相较于仅奖励设计的MARL和非LLM递归模型。
面向未来的讨论强调鲁棒性、安全性、世界模型整合、联邦学习以及低开销LLM技术。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。