QUICK REVIEW

[论文解读] Larger language models do in-context learning differently

Jerry Wei, Jason Lee|arXiv (Cornell University)|Mar 7, 2023

Topic Modeling被引用 99

一句话总结

论文表明，上下文学习在较小模型中依赖语义先验，但在较大模型中逐渐展现为学习输入–标签映射的能力，包括在语义无关标签和翻转示例的情况下，且指令微调进一步塑造这些能力。

ABSTRACT

We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.

研究动机与目标

调查来自预训练的语义先验如何影响不同规模模型的上下文学习 (ICL)。
检查更大模型是否可以通过上下文输入–标签映射来覆盖语义先验。
研究当标签与输入在语义上无关时的 ICL（SUL-ICL），以测试输入–标签映射的学习。
评估指令微调对 ICL、语义先验及输入–标签映射学习的影响。
评估在高维任务（如在 SUL-ICL 下的线性分类）中的新兴能力。

提出的方法

比较常规 ICL、翻转标签 ICL，以及语义无关标签的 ICL（SUL-ICL）在多种模型家族和规模上的差异。
使用上下文示例（默认每类 k=16），并在带有保留标签的多样化 NLP 任务上进行评估。
系统性地翻转上下文标签，以测试大模型对语义先验的覆盖能力。
将自然语言目标替换为语义无关的记号（Foo/Bar），以强制学习输入–标签映射。
评估指令微调（Flan-PaLM）相对于预训练模型在 ICL、先验和映射方面的影响。
包含高维线性分类任务以探测非语言的 ICL 能力。

实验结果

研究问题

RQ1当示例被翻转时，较小的语言模型是否可以用上下文输入–标签映射覆盖语义先验？
RQ2即使标签与任务无语义关系（SUL-ICL），较大语言模型是否获得在上下文中学习输入–标签映射的能力？
RQ3指令微调如何影响对语义先验的依赖与学习输入–标签映射在 ICL 中的平衡？
RQ4在 SUL-ICL 下，随着模型规模，执行高维线性分类的能力是否会新兴？

主要发现

当呈现翻转的上下文标签时，大模型可以覆盖语义先验，而小模型在很大程度上做不到。
在 SUL-ICL 下，模型性能随规模提升，表明出现了在没有语义先验的情况下学习输入–标签映射的能力。
指令微调的模型更好地学习输入–标签映射，但也加强了语义先验，降低了用翻转标签覆盖先验的能力。
在 SUL-ICL 设置下，更多示例带来的性能提升对大模型更明显，表明大模型更善于利用上下文映射。
某些任务（如某些 RTE 与 ETHOS 任务）在仅在大模型规模下才出现 SUL-ICL 能力的显现。
在 SUL-ICL 下，大模型甚至能在高维设置中执行线性分类。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。