QUICK REVIEW

[논문 리뷰] Sparks of GPTs in Edge Intelligence for Metaverse: Caching and Inference for Mobile AIGC Services

Minrui Xu, Dusit Niyato|arXiv (Cornell University)|2023. 04. 18.

Caching and Content Delivery인용 수 11

한 줄 요약

논문은 모바일 에지 서버에서 사전 학습된 기본 모델의 캐싱 및 추론 프레임워크를 제안하고, Age of Context를 도입하며 Least Context 알고리즘이 시스템 비용을 줄이고 에지 실행을 개선한다는 것을 보인다.

ABSTRACT

Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various AI services, such as autonomous driving, digital twins, and AI-generated content (AIGC) for extended reality. With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of Metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users' requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context (AoC). Finally, we propose a least context algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.

연구 동기 및 목표

에지에서 GPT와 같은 PFMs를 통해 AGI 가능 Metaverse 서비스의 저지연 및 프라이버시 보장을 위한 동기를 제공한다.
모바일 에지와 클라우드 계층 전반의 자원 할당을 최적화하는 공동 모델 캐싱 및 추론 프레임워크를 개발한다.
AoC를 도입하여 컨텍스트 예시의 시점성과 신선도를 측정한다.
캐시된 PFMs를 컨텍스트 유용성에 따라 관리하는 Least Context (LC) 캐싱 알고리즘을 제안한다.

제안 방법

PFMs를 LFMs, VFMs, MFMs로 분류하고 에지 설정에서의 미세 조정 및 추론 접근 방식을 요약한다.
에지-클라우드 협업 및 동적 캐시 관리가 가능한 공동 모델 캐싱 및 추론 프레임워크를 정의한다.
컨텍스트 학습 중 컨텍스트 예시의 신선도/관련성을 포착하기 위한 지표로 AoC를 도입한다.
GPU 메모리가 필요할 때 가장 적은 컨텍스트 예시를 가진 캐시된 모델을 제거하는 LC 알고리즘을 제안한다.
랜덤(Random), 클라우드(Cloud), FIFO, LFU 베이스라인과 다중 지표 비용 모델에 걸쳐 비교 평가를 제공한다.
결과 표(Table II)로 성능을 분석하는 illustrative use-case를 제시한다.

Figure 1: Categories of PFMs and their characteristic fine-tuning and inference methods. (1)-(3) The workflows of LFMs, VFMs, and MFMs. (a)-(c) The illustration of parameter-efficient fine-tuning. (d) An example of in-context learning.

실험 결과

연구 질문

RQ1모바일 에지-클라우드 시스템은 지연 및 정확도 목표를 만족하기 위해 PFMs를 최적적으로 캐시하고 실행할 수 있는가?
RQ2컨텍스트 학습 컨텍스트(AoC)가 에지 환경에서 PFM 추론 성능에 미치는 영향은 무엇인가?
RQ3LC 캐싱 전략이 시스템 비용과 에지 실행 비율 면에서 전통적 캐싱 정책을 능가하는가?

주요 결과

지표	랜덤	클라우드	FIFO	LFU	LC
System cost	25.67	7.29	27.51	5.93	4.88
Switching cost	18.72	0	23.28	0.37	0.32
Total accuracy cost	0.13	0	0.52	0.36	0.44
Average accuracy cost	0.0151	0	0.0085	0.0083	0.0076
Inference latency	0.12	0	1.30	1.32	1.26
Offloading latency	0.04	0	0.35	0.24	0.31
Cloud cost	6.63	7.29	2.05	3.63	2.52
Edge Execution Ratio	9.8%	0%	70.7%	49.4%	65.0%

LC 알고리즘은 베이스라인에 비해 전체 시스템 비용을 감소시킨다.
LC는 일부 베이스라인보다 더 높은 에지 실행 비율을 달성하여 에지에서의 추론이 더 많음을 나타낸다.
AoC 기반 컨텍스트 관리가 컨텍스트 학습의 활용을 개선해 정확도를 높이는 데 기여한다.
실험에서 LC는 평균 정확도 비용이 낮고 대안들과 비교해 추론/지연 시간 지표가 경쟁력 있다.
이 프레임워크는 컨텍스트 인식의 동적 에지-클라우드 협업이 모바일 AIGC 서비스에 유익하다는 것을 보여준다.

Figure 2: An illustration of the performance of zero-, one-, and few-shot accuracy under different model caching settings [ 3 ] .

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.