QUICK REVIEW

[论文解读] Context Collapse: In-Context Learning and Model Collapse

Josef Ott|arXiv (Cornell University)|Jan 1, 2026

Domain Adaptation and Few-Shot Learning被引用 0

一句话总结

一个硕士论文，检视大语言模型中的上下文学习与模型崩溃，展示线性变换器中的相变与伪对称解，并在数据情形下建立几乎必然收敛的崩溃现象，同时引入长生成过程中的上下文崩溃概念。

ABSTRACT

This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising the in-context loss leads to a phase transition in the learned parameters. Above a critical context length, the solution develops a skew-symmetric component. We prove this by reducing the forward pass of the linear transformer under weight tying to preconditioned gradient descent, and then analysing the optimal preconditioner. This preconditioner includes a skew-symmetric component, which induces a rotation of the gradient direction. For model collapse, we use martingale and random walk theory to analyse simplified settings - linear regression and Gaussian fitting - under both replacing and cumulative data regimes. We strengthen existing results by proving almost sure convergence, showing that collapse occurs unless the data grows sufficiently fast or is retained over time. Finally, we introduce the notion of context collapse: a degradation of context during long generations, especially in chain-of-thought reasoning. This concept links the dynamics of ICL with long-term stability challenges in generative models.

研究动机与目标

研究带权重绑定的线性变换器在线性回归任务上的上下文学习（ICL）。
分析最小化上下文损失如何在学习参数中引发相变。
在不同的数据情形下，利用 martingale 和随机游走理论研究模型崩溃。
引入上下文崩溃的概念，即在长生成过程中上下文的退化。

提出的方法

将带权重绑定的线性变换器的前向传播简化为预条件梯度下降。
分析最优预条件子，显示其包含旋转梯度方向的伪对称分量。
在简化情形（如线性回归与高斯拟合）下，应用 replacing 与 cumulative 数据情形的 martingale 和随机游走理论。
给出关于崩溃现象的几乎必然收敛性结果。
刻画ICL 动态与生成模型长期稳定性挑战之间的联系。

实验结果

研究问题

RQ1在带 ICL 的线性变换器对线性回归任务学习的参数时， context 长度对学习参数有何影响？
RQ2最小化上下文损失是否导致相变以及解中出现伪对称分量？
RQ3在简化情形如线性回归与高斯拟合下，模型崩溃在不同数据情形中如何发生？
RQ4在何种条件下崩溃不会发生，数据增长或保留如何影响？
RQ5上下文学习动态与长生成过程中的稳定性问题（上下文崩溃）之间的关系为何？

主要发现

当上下文长度跨越临界阈值时，学习参数会发生相变。
在分析的线性变换器中，最优预条件子包含一个旋转梯度方向的伪对称分量。
在替换数据与累计数据情形下，崩溃现象可用 martingale 与随机游走理论表征，并给出几乎必然收敛的结果。
除非数据增长足够快或随时间保留，否则在所研究的情形下会发生崩溃。
引入了一种新的上下文崩溃概念，将 ICL 动态与生成模型的长期稳定性挑战联系起来。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。