QUICK REVIEW

[论文解读] Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks

Minrui Xu, Dusit Niyato|arXiv (Cornell University)|Mar 9, 2024

Satellite Communication Systems被引用 7

一句话总结

本文提出一个联合模型缓存与推理框架，在SAGINs中提供LLM代理服务，引入缓存模型作为资源、思维年龄（AoT）指标以及基于深度强化学习的MSB拍卖，以提高效率并减缓逆向选择。

ABSTRACT

Edge intelligence in space-air-ground integrated networks (SAGINs) can enable worldwide network coverage beyond geographical limitations for users to access ubiquitous and low-latency intelligence services. Facing global coverage and complex environments in SAGINs, edge intelligence can provision approximate large language models (LLMs) agents for users via edge servers at ground base stations (BSs) or cloud data centers relayed by satellites. As LLMs with billions of parameters are pre-trained on vast datasets, LLM agents have few-shot learning capabilities, e.g., chain-of-thought (CoT) prompting for complex tasks, which raises a new trade-off between resource consumption and performance in SAGINs. In this paper, we propose a joint caching and inference framework for edge intelligence to provision sustainable and ubiquitous LLM agents in SAGINs. We introduce "cached model-as-a-resource" for offering LLMs with limited context windows and propose a novel optimization framework, i.e., joint model caching and inference, to utilize cached model resources for provisioning LLM agent services along with communication, computing, and storage resources. We design "age of thought" (AoT) considering the CoT prompting of LLMs, and propose a least AoT cached model replacement algorithm for optimizing the provisioning cost. We propose a deep Q-network-based modified second-bid (DQMSB) auction to incentivize network operators, which can enhance allocation efficiency by 23% while guaranteeing strategy-proofness and free from adverse selection.

研究动机与目标

在SAGINs中激励边缘智能，以在有限资源下提供普遍的LLM代理服务。
将缓存模型作为一种新资源类型引入，作为与通信、计算和存储并列的资源。
开发联合的模型缓存与推理框架，在满足覆盖约束的同时最小化配置成本。
定义并利用思维年龄（AoT）度量来管理CoT提示并为缓存驱逐决策提供信息。
设计一个DQMSB拍卖以激励网络运营商，同时确保策略性证明且避免逆向选择。

提出的方法

为SAGINs中的模型缓存、请求下放和资源分配构建一个联合优化框架。
引入思维年龄（AoT）指标，以量化缓存的LLM中间CoT思路的时效性。
提出一种最小AoT缓存替换算法，以淘汰对AoT影响最小的缓存模型。
对CoT推理过程及其与上下文窗口使用和微调学习（few-shot learning）在边缘LLM代理中的关系进行建模。
开发基于深度Q网络的改进第二价（DQMSB）拍卖，以在保证策略性证明的同时优化定价。

Figure 1: Joint caching and inference framework for provisioning large language model (LLM) agents in SAGINs.

实验结果

研究问题

RQ1在异构边缘资源和有限上下文窗口的情况下，如何高效配置SAGINs中的LLM代理服务？
RQ2如何将缓存的LLM视为一种资源，在支持CoT提示的同时降低延迟和能耗？
RQ3是否可以设计一种拍卖机制，在保持策略性证明的同时激励运营商共享资源且避免逆向选择？
RQ4CoT提示和AoT感知缓存对配置成本和服务质量的影响是什么？

主要发现

引入将缓存模型作为边缘智能资源的概念，用于SAGINs。
定义AoT以捕捉中间CoT思路的相关性与连贯性，并用它来引导缓存淘汰。
提出一种最小AoT缓存替换算法，在GPU、带宽和覆盖约束下最小化配置成本。
开发一个使用DRL来选择定价尺度的DQMSB拍卖框架，提高分配效率并减轻逆向选择。
概述一个整合云数据中心、卫星和地面基站的框架，以降低延迟并提升隐私地提供LLM代理服务。

Figure 2: The workflow of the joint caching and inference framework for provisioning LLM agents with cached models.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。