[论文解读] The Learnability of In-Context Learning
这篇论文提出了一个基于 PAC 的框架,用于冻结预训练模型的上下文内学习,并在温和假设下对混合任务预训练分布给出有限样本可学习性保证。
In-context learning is a surprising and important phenomenon that emerged when modern language models were scaled to billions of learned parameters. Without modifying a large language model's weights, it can be tuned to perform various downstream natural language tasks simply by including concatenated training examples of these tasks in its input. Though disruptive for many practical applications of large language models, this emergent learning paradigm is not well understood from a theoretical perspective. In this paper, we propose a first-of-its-kind PAC based framework for in-context learnability, and use it to provide the first finite sample complexity results for the in-context learning setup. Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution, and then a second in-context learning phase, which keeps this function constant and concatenates training examples of the downstream task in its input. We use our framework in order to prove that, under mild assumptions, when the pretraining distribution is a mixture of latent tasks (a model often considered for natural language pretraining), these tasks can be efficiently learned via in-context learning, even though the model's weights are unchanged and the input significantly diverges from the pretraining distribution. Our theoretical analysis reveals that in this setting, in-context learning is more about identifying the task than about learning it, a result which is in line with a series of recent empirical findings. We hope that the in-context learnability framework presented in this paper will facilitate future progress towards a deeper understanding of this important new learning paradigm.
研究动机与目标
- 用冻结模型定义上下文内学习的 PAC 学习框架。
- 在多任务预训练设置中提供上下文内学习的有限样本复杂度结果。
- 表明上下文内学习可以识别潜在任务,而不是通过提示来学习它们。
- 将理论结果与经验观察相连接,即上下文内学习取决于任务识别。
提出的方法
- 将预训练分布建模为潜在任务的混合,其中一个潜在变量表示任务。
- 将上下文内学习定义为使用串联输入-标签对的提示来预测标签。
- 对预训练混合分布建立假设(近似独立、令牌下界、正先验)。
- 证明正确任务相对于其他任务的提示似然比在 k 增大时趋于集中并有利于正确分量。
- 导出有限样本界,在边缘和 KL 散度条件下显示在上下文内学习的高效可学习性。
- 给出两种情形分析(大边缘和小边缘情形),以界定预测误差。
实验结果
研究问题
- RQ1当从潜在任务混合中抽取的串联示例提示时,冻结的预训练模型是否能够在下游任务上实现较低的上下文内损失?
- RQ2在何种条件和样本复杂度下,上下文内学习能够识别潜在任务并在不更新权重的情况下实现贝叶斯最优预测?
- RQ3预训练分布的混合结构如何影响上下文内学习的有效性和可学习性?
- RQ4边缘和混合分量之间的 KL 散度在保证有限样本可学习性中起到何种作用?
主要发现
- 在温和假设下,存在有限样本(多项式量级)的上下文内学习可学习性保证。
- 上下文内的提示倾向于重新加权混合分量的先验,有助于潜在任务的识别。
- 在具有大边缘的情形下,只要有足够的上下文内示例, Ground-truth 的上下文内预测器就能匹配贝叶斯最优预测器。
- 即使边缘较小时,损失也受贝叶斯误差的上界约束,确保预测的鲁棒性。
- 两部分分析表明预训练的不完美和任务识别错误可以被控制,从而实现整体可学习性。
- 该框架的推广不仅限于无限数据,并且与经验观察一致,即任务识别而非任务学习是上下文内学习的核心。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。