QUICK REVIEW

[论文解读] The Quantization Model of Neural Scaling

Eric J. Michaud, Ziming Liu|arXiv (Cornell University)|Mar 23, 2023

Neural Networks and Applications被引用 8

一句话总结

本论文提出神经尺度的量化模型，认为模型知识以离散量子学习，其使用频率遵循Zipf分布，从而产生幂律损失缩放；在 toy 数据上验证该思路，通过将行为分解为量子来分析LLM的尺度，并概述一种从梯度中自动发现这些量子的办法。

ABSTRACT

We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are "quantized" into discrete chunks ($ extbf{quanta}$). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model gradients, we automatically decompose model behavior into a diverse set of skills (quanta). We tentatively find that the frequency at which these quanta are used in the training distribution roughly follows a power law corresponding with the empirical scaling exponent for language models, a prediction of our theory.

研究动机与目标

为神经尺度的量化假说提供动机与形式化表述。
推导理论含义，展示学习离散量子如何导致幂律损失缩放。
用 toy 数据集演示尺度来自有结构的子任务分布。
将大语言模型的尺度分解，量化量子及其使用模式。
提出 QDG（Quanta Discovery from Gradients）以从梯度自动识别语言模型中的量子。

提出的方法

将量子定义为模型学习的离散知识/技能模块。
将 Zipf 分布用于量子，推导损失 L_n 与学习到的量子 n 的关系。
证明 Ln 以幂律趋近于 L∞：Ln ≈ a + (b−a) n^(-α)。
构建带有 Zipfian 子任务分布的多任务稀疏奇偶性 toy 数据集以诱导尺度。
通过测量逐词损失和基于梯度的量子聚类，分析 Pythia 模型的尺度。
使用对归一化梯度的谱聚类，提出从梯度发现量子（QDG），以找到连贯的技能簇。

实验结果

研究问题

RQ1神经网络是否学习了一组离散的量子来支配性能？
RQ2量子使用的频率是否遵循幂律，从而产生观测到的尺度幂次？
RQ3参数/数据尺度的幂指数是否能够通过量化模型相关联？
RQ4是否可以利用梯度信息自动发现并验证语言模型中的量子？
RQ5在大型语言模型中，尺度如何在子任务或标记上分解？

主要发现

损失随学习的量子增多呈现幂律下降，Ln − L∞ ∝ n^(−α)。
toy 多任务稀疏奇偶性实验显示尺度与参数、数据和步数相关，符合量化模型。
在 Pythia 语言模型中，平均交叉熵损失随模型规模变化的幂指数约为 αN ≈ 0.083（不包含最大模型的拟合点）。
在固定尺度下的损失分布趋于集中于接近零，但零损失标记对平均损失质量贡献有限。
标记通常呈现多基因改进（多量子）而非单量子（单基因）尺度，但某些标记表现出类似单基因的陡峭跃迁。
基于梯度的聚类发现与可解释模型技能相对应的连贯量子簇，如递增数列的步进。
发现的量子的秩-频分析显示幂律趋势，斜率约为 −1.24，与理论预测基本吻合。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。