[论文解读] Carbon Emissions and Large Neural Network Training
该论文估算多种大型模型的能量使用和碳足迹,并概述在 ML 训练与评估中降低排放的策略。
The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters. Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within the same country and the same organization. We are now optimizing where and when large models are trained. Specific datacenter infrastructure matters, as Cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems. Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X. These large factors also make retroactive estimates of energy cost difficult. To avoid miscalculations, we believe ML papers requiring large computational resources should make energy consumption and CO2e explicit when practical. We are working to be more transparent about energy use and CO2e in our future research. To help reduce the carbon footprint of ML, we believe energy usage and CO2e should be a key metric in evaluating models, and we are collaborating with MLPerf developers to include energy usage during training and inference in this industry standard benchmark.
研究动机与目标
- 量化最近大型神经网络的能量消耗和碳足迹(如 T5、Meena、GShard、Switch Transformer、GPT-3)。
- 在能源使用方面对先前神经架构搜索的估计进行细化。
- 通过模型架构、数据中心选择和训练实践,突出减少 CO2e 的机会。
- 主张在 ML 研究和基准测试中将能源和 CO2e 作为明确指标纳入。
提出的方法
- 评审并汇总最近几种大型模型(T5、Meena、GShard、Switch Transformer、GPT-3)的能量使用与 CO2e 估计。
- 更新早期神经架构搜索在能效方面的估计(演化 Transformer)。
- 分析影响能耗的因素,包括模型稀疏性、地理位置、数据中心基础设施和加速器。
- 提出在 ML 训练与推理中减少排放、提高能效的实际策略。
实验结果
研究问题
- RQ1最近大型神经网络的能量使用和 CO2e 的估计是多少?
- RQ2架构选择、数据中心特征和地理位置如何影响碳足迹?
- RQ3在大规模 ML 训练中有哪些策略能够显著降低能耗和 CO2e?
- RQ4是否应将能源使用和 CO2e 纳入标准 ML 评估与基准测试?
主要发现
- 大型、稀疏激活的深度神经网络在不牺牲精度的前提下,能耗可低于大型密集型网络的十分之一甚至更少,且参数数量相近或更多。
- 地理位置可能导致 CO2e 差异为 5X–10X,这取决于不同的无碳能源比例。
- 优化在何时何地对大型模型进行训练能够带来显著的排放降低。
- 数据中心基础设施很重要,云数据中心通常比典型数据中心的能效高 1.4–2 倍,而其中的 ML 加速器比现成系统高效 2–5 倍。
- DNN、数据中心和处理器的选择可以将碳足迹降低至 100–1000 倍。
- 作者主张在 ML 研究中明确报告能量使用和 CO2e,并将这些指标纳入 MLPerf 基准测试。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。