QUICK REVIEW

[论文解读] Compute Trends Across Three Eras of Machine Learning

Jaime Sevilla, Lennart Heim|arXiv (Cornell University)|Feb 11, 2022

Machine Learning and Data Classification参考文献 112被引用 40

一句话总结

本论文分析训练计算在三个时代——Pre Deep Learning、Deep Learning与Large-Scale——中的演化，发现不同的翻倍时间，以及大型模型在后期出现并有独立趋势。

ABSTRACT

Compute, data, and algorithmic advances are the three fundamental factors that guide the progress of modern Machine Learning (ML). In this paper we study trends in the most readily quantified factor - compute. We show that before 2010 training compute grew in line with Moore's law, doubling roughly every 20 months. Since the advent of Deep Learning in the early 2010s, the scaling of training compute has accelerated, doubling approximately every 6 months. In late 2015, a new trend emerged as firms developed large-scale ML models with 10 to 100-fold larger requirements in training compute. Based on these observations we split the history of compute in ML into three eras: the Pre Deep Learning Era, the Deep Learning Era and the Large-Scale Era. Overall, our work highlights the fast-growing compute requirements for training advanced ML systems.

研究动机与目标

整理一个包含训练计算数据的里程碑级 ML 系统数据集。
识别并描述 ML 计算增长的不同时代。
提供计算翻倍时间的估计并讨论对硬件和研究激励的含义。

提出的方法

收集一个包含 123 个里程碑 ML 模型及其训练计算注释的数据集。
对训练计算 (FLOPs) 随时间的对数线性模型拟合，以估计翻倍时间。
将历史分割为 Pre Deep Learning、Deep Learning 和 Large-Scale 时代，比较斜率和拟合质量。
通过附录对替代解释和领域差异进行交叉验证结果。

实验结果

研究问题

RQ1在 Deep Learning 出现之前和之后，训练计算的增长率（翻倍时间）是多少？
RQ2在 2015-2016 年左右是否出现了一个独特的大规模时代，它与常规模算力增长有何不同？
RQ3所识别的趋势与 Milestone ML 模型数据的拟合程度如何，估计的不确定性有哪些？

主要发现

Period	数据（FLOPs 开始-结束）	规模	斜率（OOMs/年）	翻倍时间	R²
1952 to 2010	3e+04 to 2e+14	Pre Deep Learning	0.2	21.3 months	0.77

Pre Deep Learning Era 显示计算量大致遵循摩尔定律，约每 21 个月翻倍（1952–2010）。
Deep Learning Era 将计算增长加速到约每 5–6 个月翻倍（2010–2022）。
Large-Scale Era 于 2015–2016 年左右出现，模型超越先前趋势，约每 10 个月翻倍（2015 晚至 2022）。
总体而言，三個时代捕捉到计算趋势的跳跃，并突出高级 ML 系统对计算的日益增长的需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。