QUICK REVIEW

[论文解读] Efficient Large Language Models: A Survey

Zhongwei Wan, Xin Wang|arXiv (Cornell University)|Dec 6, 2023

Topic Modeling被引用 23

一句话总结

对高效大语言模型的系统性综述，按模型中心、数据中心和框架中心的方法进行组织，并维护一个汇集相关工作的 GitHub 资源。

ABSTRACT

Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding and language generation, and thus have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency challenges. In this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey. We will actively maintain the repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient LLMs research and inspire them to contribute to this important and exciting field.

研究动机与目标

提供一个全面的高效 LLM 研究分类法，覆盖模型中心、数据中心和框架中心的视角。
总结在训练、推理和部署中提升效率的关键技术。
强调影响效率和可扩展性的数据信息和框架因素。
提供一个经过社区维护的相关论文参考仓库。

提出的方法

提出三分法分类：模型中心、数据中心和框架中心的高效性主题。
在每个类别中回顾技术（如压缩、预训练、微调、推理、架构；数据选择；提示工程；专用框架）。
将发现综合成一个结构化的总览，并提供一个用于持续收集论文的 GitHub 资源。

实验结果

研究问题

RQ1使 LLMs 更高效的主要模型中心方法有哪些（压缩、预训练、微调、推理、架构）？
RQ2哪些数据中心策略（数据选择、提示）能为 LLMs 的效率提升做出贡献？
RQ3哪些框架级工具和框架专门支持高效的 LLM 开发和部署？
RQ4这些高效技术在大规模模型中的权衡与实际影响有哪些？
RQ5研究人员如何通过一个维护中的仓库有效地导航高效 LLM 相关文献？

主要发现

模型	参数规模	数据规模	GPU 成本	训练时间
GPT-3 (Brown et al., 2020)	175B	300B tokens	-	-
GPT-NeoX-20B (Black et al., 2022)	20B	825GB corpus	96 A100-40G	-
OPT (Zhang et al., 2022a)	175B	180B tokens	992 A100-80G	-
BLOOM (Scao et al., 2022)	176B	366B tokens	384 A100-80G	105 days
GLM (Zeng et al., 2022)	130B	400B tokens	786 A100-40G	60 days
LLaMA (Touvron et al., 2023a)	65B	1.4T tokens	2048 A100-80G	21 days
LLaMA-2 (Touvron et al., 2023b)	70B	2T tokens	A100-80G	71,680 GPU days
Gopher (Rae et al., 2021)	280B	300B tokens	1024 A100	13.4 days
LaMDA (Thoppilan et al., 2022)	137B	768B tokens	1024 TPU-v3	57.7 days
GLaM (Du et al., 2022)	1200B	280B tokens	1024 TPU-v4	574 hours
PanGu-alpha (Zeng et al., 2021)	13B	1.1TB corpus	2048 Ascend 910	-
PanGu-sum (Ren et al., 2023b)	1085B	329B tokens	512 Ascend 910	100 days
PaLM (Chowdhery et al., 2022)	540B	780B tokens	6144 TPU-v4	-
PaLM-2 (Anil et al., 2023)	-	3.6T tokens	TPUv4	-
WeLM (Su et al., 2022b)	10B	300B tokens	128 A100-40G	24 days
Flan-PaLM (Chung et al., 2022)	540B	-	512 TPU-v4	37 hours
AlexaTM (Soltan et al., 2022)	20B	1.3 tokens	128 A100	120 days
Codegeex (Zheng et al., 2023)	13B	850 tokens	1536 Ascend 910	60 days
MPT-7B (Team, 2023)	7B	1T tokens	-	-

该综述在模型中心、数据中心和框架中心视角下提出了高效 LLM 研究的全面分类法。
它突出显示了广泛的技术，包括模型压缩的量化、剪枝、低秩近似和知识蒸馏；数据中心的选择数据与提示工程；以及用于高效训练和服务的专用框架。
本文强调了高效性研究涵盖算法、系统层面和数据因素，并提供一个 GitHub 仓库来组织和维护相关文献。
它认为更大的模型带来更高的性能，但资源需求也大幅增加，促使需要全面的高效策略。
该工作汇集了具有代表性的预训练成本和模型特征，以在一系列知名 LLMs 的背景下阐明效率需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。