QUICK REVIEW

[论文解读] A Survey of Resource-efficient LLM and Multimodal Foundation Models

Mengwei Xu, Wangsong Yin|arXiv (Cornell University)|Jan 16, 2024

Topic Modeling被引用 32

一句话总结

对大型语言模型、视觉变换器、扩散，以及多模态基础模型在训练、推理和部署等各环节实现更高资源效率的算法与系统级方法的全面综述，从云端到边缘。

ABSTRACT

Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of these large models in a scalable and environmentally sustainable way, there has been a considerable focus on developing resource-efficient strategies. This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects. It offers a comprehensive analysis and valuable insights gleaned from existing literature, encompassing a broad array of topics from cutting-edge model architectures and training/serving algorithms to practical system designs and implementations. The goal of this survey is to provide an overarching understanding of how current approaches are tackling the resource challenges posed by large foundation models and to potentially inspire future breakthroughs in this field.

研究动机与目标

评估大型基础模型所带来的资源挑战及对高效性的需求。
综述在不牺牲性能的前提下，降低计算、内存、能耗和带宽的算法与系统方法。
对模型架构、训练与推断方法、数据管理以及部署系统的进展进行分类。
将语言、视觉以及多模态基础模型的洞见融合起来，为未来研究和实际应用提供指导。

提出的方法

编目语言、视觉和多模态基础的架构与代表性模型。
分析成本因素与效率挑战，包括注意力、FFN 以及 KV 缓存的影响。
总结资源高效的架构（如高效的注意力变体、Mixture of Experts、扩散在潜在空间的优化）以及数据/训练技巧。
概述适用于预训练、微调和推理的资源高效算法（如数据降维/减少、混合精度、渐进学习、剪枝、量化）。
描述从分布式训练到边缘部署与服务的资源高效系统方面。

Figure 1: The electricity consumption comparison between countries and AI. Data source: [ 77 ] .

实验结果

研究问题

RQ1当前语言、视觉和多模态基础模型中最主要的资源瓶颈有哪些？
RQ2存在哪些提升训练与部署效率的架构和系统级策略？
RQ3预训练、微调和推理中的设计选择如何影响跨模态的资源使用？
RQ4将资源高效的基础模型从云端部署到边缘的实用指南有哪些？

主要发现

基础模型在多功能性方面表现出色，但在训练和服务阶段需要大量硬件和能源消耗。
各类高效方法针对注意力、数据吞吐量和模型架构（如稀疏/近似注意力、Mixture of Experts、潜在空间的扩散）.
面向资源的训练与推理技术（混合精度、数据降低、渐进学习、高效微调）可以在不普遍牺牲性能的前提下降低计算量和内存占用。
系统级设计选择（分布式训练、联邦学习、云端与边缘部署）对可行性和能源使用有关键性影响。
本综述整合了架构、算法与系统设计，以引导未来工作朝着可扩展和可持续的基础模型发展。

Figure 3: The evolutionary trace of foundation models.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。