QUICK REVIEW

[논문 리뷰] A Survey of Resource-efficient LLM and Multimodal Foundation Models

Mengwei Xu, Wangsong Yin|arXiv (Cornell University)|2024. 01. 16.

Topic Modeling인용 수 32

한 줄 요약

대형 언어 모델, 비전 트랜스포머, 확산, 다중 모달 파운데이션 모델을 교육, 추론, 배포 전 과정에서 자원 효율성을 높이는 알고리즘 및 시스템 수준 접근 방식에 대한 포괄적인 조사, 클라우드에서 엣지까지.

ABSTRACT

Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of these large models in a scalable and environmentally sustainable way, there has been a considerable focus on developing resource-efficient strategies. This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects. It offers a comprehensive analysis and valuable insights gleaned from existing literature, encompassing a broad array of topics from cutting-edge model architectures and training/serving algorithms to practical system designs and implementations. The goal of this survey is to provide an overarching understanding of how current approaches are tackling the resource challenges posed by large foundation models and to potentially inspire future breakthroughs in this field.

연구 동기 및 목표

대형 파운데이션 모델이 제기하는 자원 문제와 효율성 필요성 평가.
성과를 저하시키지 않으면서 계산량, 메모리, 에너지, 대역폭을 줄이는 알고리즘적 및 시스템적 접근법을 조사.
모델 아키텍처, 학습 및 추론 방법, 데이터 관리, 배포 시스템의 진보를 분류.
언어, 비전, 다중 모달 파운데이션 모델의 통찰을 연결하여 향후 연구와 실용적 구현을 안내.

제안 방법

언어, 비전, 다중 모달 파운데이션 전반의 아키텍처와 대표 모델을 분류/목록화.
주의(attention), FFN, KV 캐싱의 함의를 포함한 비용 요인 및 효율성 도전 과제 분석.
자원 효율적인 아키텍처(예: 효율적 주의 변형, Mixture of Experts, 잠재 공간에서의 확산) 및 데이터/학습 트릭 요약.
사전 학습, 미세 조정, 추론을 위한 자원 효율 알고리즘 개요(예: 데이터 감소, 혼합 정밀도, 점진적 학습, 가지치기, 양자화).
분산 학습에서 엣지 배포 및 서비스에 이르는 자원 효율적 시스템 측면 설명.

Figure 1: The electricity consumption comparison between countries and AI. Data source: [ 77 ] .

실험 결과

연구 질문

RQ1현재의 언어, 비전, 다중 모달 파운데이션 모델에서 지배적인 자원 병목 현상은 무엇인가?
RQ2학습 및 배포의 효율성 향상을 위한 어떤 아키텍처 및 시스템 차원의 전략이 존재하는가?
RQ3사전 학습, 미세 조정, 추론의 설계 선택이 모달리티 전반에 걸친 자원 사용에 어떻게 영향을 미치는가?
RQ4클라우드에서 엣지까지 자원 효율 파운데이션 모델을 배포하기 위한 실용 가이드라인은 무엇인가?

주요 결과

파운데이션 모델은 높은 범용성을 달성하지만 학습 및 서비스 중 상당한 하드웨어 및 에너지 비용을 수반한다.
다양한 효율성 접근법은 주의 메커니즘, 데이터 처리량 및 모델 아키텍처를 목표로 한다(예: 희소/근사 주의, Mixture of Experts, 잠재 공간 확산).
자원 인식 학습 및 추론 기법(혼합 정밀도, 데이터 감소, 점진적 학습, 효율적 미세 조정)은 성능을 균일하게 희생하지 않으면서 계산 및 메모리를 줄일 수 있다.
시스템 차원의 설계 선택(분산 학습, 연합 학습, 클라우드 대 엣지 배포)은 실용성 및 에너지 사용에 결정적으로 영향을 미친다.
본 조사는 아키텍처, 알고리즘 및 시스템 설계를 통합하여 향후 확장 가능하고 지속 가능한 파운데이션 모델을 향한 연구를 안내한다.

Figure 3: The evolutionary trace of foundation models.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.