[论文解读] Are Emergent Abilities in Large Language Models just In-Context Learning?
论文在18个模型(60M–175B参数)跨越22个任务上评估新兴能力,控制了就地学习和指令微调的影响,结果发现新兴能力在很大程度上可以用就地学习来解释,而非真正的突现推理。
Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them. These capabilities, referred to as "emergent abilities," have been a driving force in discussions regarding the potentials and risks of language models. A key challenge in evaluating emergent abilities is that they are confounded by model competencies that arise through alternative prompting techniques, including in-context learning, which is the ability of models to complete a task based on a few examples. We present a novel theory that explains emergent abilities, taking into account their potential confounding factors, and rigorously substantiate this theory through over 1000 experiments. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge. Our work is a foundational step in explaining language model performance, providing a template for their efficient use and clarifying the paradox of their ability to excel in some instances while faltering in others. Thus, we demonstrate that their capabilities should not be overestimated.
研究动机与目标
- 评估在移除就地学习和指令微调后,哪些能力是真正的新兴能力。
- 将新兴能力与提示技术的影响区分开。
- 评估指令微调是否触发就地学习或揭示真正的推理能力。
提出的方法
- 在零-shot设置下评估未经过指令微调的模型,以消除就地学习效应。
- 比较模型家族(GPT、T5、Falcon、LLaMA)在不同规模上有无指令微调。
- 使用来自以往工作的经过策划的任务集合,既包含新兴也包含非新兴任务,并进行谨慎的偏差控制。
- 将任务手动分类为可记忆、形式化和功能性类别以解释结果。
- 分析指令微调是否通过利用就地学习来解释额外能力(奥卡姆剃刀)。

实验结果
研究问题
- RQ1在没有就地学习和指令微调的情况下,哪些能力是真正的新兴能力?
- RQ2指令微调是否会诱发或依赖就地学习来解释观察到的能力?
- RQ3观察到的能力是否归因于正式语言技能、记忆或功能性推理?
- RQ4更简单的解释(就地学习)是否能解释在指令微调模型中看到的提升?
主要发现
- 新兴能力在很大程度上归因于就地学习,而非内在的突现。
- 在缺乏提示技术的情况下没有推理能力突现的证据。
- 指令微调通过有效利用就地能力,而非真正的突现推理,主要提升任务表现。
- 在这些分析中,正式语言能力和记忆仍与功能性推理能力区分开来。
- 该研究提供了代码和结果的开放获取以便复制。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。