QUICK REVIEW

[论文解读] Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

Baolin Li, Yankai Jiang|arXiv (Cornell University)|Mar 19, 2024

Topic Modeling被引用 5

一句话总结

Sprout 引入生成功指令和线性规划优化器，通过引导生成长度在保持质量的同时减少 LLM 推理的碳排放，在实际使用 Llama2 13B 的测试中实现超过 40% 的排放减少。

ABSTRACT

The rapid advancement of Generative Artificial Intelligence (GenAI) across diverse sectors raises significant environmental concerns, notably the carbon emissions from their cloud and high performance computing (HPC) infrastructure. This paper presents Sprout, an innovative framework designed to address these concerns by reducing the carbon footprint of generative Large Language Model (LLM) inference services. Sprout leverages the innovative concept of "generation directives" to guide the autoregressive generation process, thereby enhancing carbon efficiency. Our proposed method meticulously balances the need for ecological sustainability with the demand for high-quality generation outcomes. Employing a directive optimizer for the strategic assignment of generation directives to user prompts and an original offline quality evaluator, Sprout demonstrates a significant reduction in carbon emissions by over 40% in real-world evaluations using the Llama2 LLM and global electricity grid data. This research marks a critical step toward aligning AI technology with sustainable practices, highlighting the potential for mitigating environmental impacts in the rapidly expanding domain of generative artificial intelligence.

研究动机与目标

激发并量化 GenAI 推理的环境影响，识别在超越模型规模缩减的情况下减少碳排放的机会。
引入生成指令作为一种新颖的控制机制，以影响标记生成与排放。
开发一个系统范围的优化器，在不同电网碳强度下平衡碳排放节省与保持的生成质量。
在 Llama2 13B 上原型化 Sprout，并结合真实电网数据，展示碳减排效果并维持输出质量。

提出的方法

定义在自回归 LLM 推理过程中约束或引导标记生成的生成指令等级。
通过在不同提示中选择指令等级概率来最小化每次推理的期望碳排放，形成一个线性规划优化问题。
结合使用自动评估的 LLM 的离线质量评估器，生成质量偏好向量以约束指令使用。
实现一个机会性的离线质量评估计划，在低碳强度时段触发评估，以最小化额外排放。
将 Sprout 与现有推理服务器集成，使用系统提示作为指令并通过 CarbonTracker 进行日志记录，以计算优化器所需的 e 和 p 向量。

Figure 1: The auto-regressive generation process of generative language model inference.

实验结果

研究问题

RQ1在独立于模型规模的前提下，生成标记数量如何影响 LLM 推理的碳足迹？
RQ2生成指令是否能够引导标记生成以降低排放，同时在各任务中不显著降低生成质量？
RQ3在高吞吐量场景中，系统范围的概率性指令策略是否能够在保持实用性的同时接近逐提示的最优指令？
RQ4适应电网碳强度的碳感知优化在碳排放约束下维持质量的有效性如何？

主要发现

生成指令能够在保持高质量输出的同时减少标记生成长度，从而实现碳排放节省。
在碳效率和正确性方面，使用带有简明指令（L1）的 13B 模型可以优于较小的模型（7B 带基线）。
在使用 Llama2 13B 与全球电力网数据的现实世界评估中，Sprout 将推理排放降低了超过 40%。
优化问题是线性的，允许使用 HiGHS 对偶单纯形求解器来计算系统范围的指令概率。
质量反馈通过自动评估的 LLM 离线获得，使基于约束的优化在在线推理中不产生延迟影响。

Figure 2: Two factors that impact a request’s carbon footprint during LLM inference: (a) the number of model parameters and (b) the number of generated tokens.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。