Skip to main content
QUICK REVIEW

[论文解读] Understanding the Curse of Unrolling

Sheheryar Mehmood, Florian Knöll|arXiv (Cornell University)|Feb 23, 2026
Machine Learning and Data Classification被引用 0
一句话总结

论文提供了对算法展开中导数迭代的非渐近分析,识别驱动展开诅咒的因素,并表明截断或热启动可缓解该问题,且有实验证据支持。

ABSTRACT

Algorithm unrolling is ubiquitous in machine learning, particularly in hyperparameter optimization and meta-learning, where Jacobians of solution mappings are computed by differentiating through iterative algorithms. Although unrolling is known to yield asymptotically correct Jacobians under suitable conditions, recent work has shown that the derivative iterates may initially diverge from the true Jacobian, a phenomenon known as the curse of unrolling. In this work, we provide a non-asymptotic analysis that explains the origin of this behavior and identifies the algorithmic factors that govern it. We show that truncating early iterations of the derivative computation mitigates the curse while simultaneously reducing memory requirements. Finally, we demonstrate that warm-starting in bilevel optimization naturally induces an implicit form of truncation, providing a practical remedy. Our theoretical findings are supported by numerical experiments on representative examples.

研究动机与目标

  • 解释展开中导数迭代的非渐近起源。
  • 识别影响展开求导中导数误差的算法因素。
  • 展示对早期迭代的截断如何减少诅咒及内存使用。
  • 证明在双层优化中热启动会导致隐式截断。
  • 提供数值实验以验证理论与实际解决方案。

提出的方法

  • 将内点问题建模为带映射 A 的不动点迭代,并通过隐式求导研究其导数。
  • 推导导数序列 D x^(k)(u) 及其误差的非渐近界,利用前向模式和反向模式自动微分。
  • 引入并分析从较晚的迭代开始导数计算的截断(晚启动)方案。
  • 证明截断的导数序列收敛并给出诅咒项的界。
  • 在固定计算预算下讨论求解内点问题与求导之间的资源分配。
  • 将热启动视为一种隐式截断机制,并将其与现有实践联系起来。
Figure 1 : Iterate $\bm{x}^{(k)}(\bm{u})$ vs derivative $D\bm{x}^{(k)}(\bm{u})$ error plot for gradient descent applied to $f(\bm{x},u)\coloneqq\|A\bm{x}-\bm{b}\|^{2}/2+u\|\bm{x}\|^{2}/2$ . Unlike $\bm{x}^{(k)}(\bm{u})$ , $D\bm{x}^{(k)}(\bm{u})$ initially drifts away from its limit before eventually
Figure 1 : Iterate $\bm{x}^{(k)}(\bm{u})$ vs derivative $D\bm{x}^{(k)}(\bm{u})$ error plot for gradient descent applied to $f(\bm{x},u)\coloneqq\|A\bm{x}-\bm{b}\|^{2}/2+u\|\bm{x}\|^{2}/2$ . Unlike $\bm{x}^{(k)}(\bm{u})$ , $D\bm{x}^{(k)}(\bm{u})$ initially drifts away from its limit before eventually

实验结果

研究问题

  • RQ1在展开求导中,导数迭代为何最初会偏离真实雅可比矩阵?
  • RQ2截断和热启动如何影响导数迭代的非渐近行为?
  • RQ3在展开求导中,我们能否量化计算、内存与精度之间的权衡?
  • RQ4在何种条件下,截断或热启动的求导能够恢复或近似真实雅可比矩阵?
  • RQ5在存在展开迭代的情况下,前向和反向模式 AD 如何传播导数信息?

主要发现

  • 导数迭代可能在初始阶段出现非渐近的误差增加,然后才收敛到真实雅可比矩阵。
  • 一个界显示出一个先增后减的项,捕捉展开诅咒;该项由压缩率和 Lipschitz 常数支配。
  • 从导数计算中截断早期迭代可缓解诅咒并降低内存使用。
  • 在双层优化中,热启动通过从接近不动点开始导数路径实现隐式截断,提供可行的解决方案。
  • 显式截断界定了延迟导数计算如何减弱诅咒,并对截断方案给出收敛保证。
  • 在代表性问题上的实验支持理论发现。
Figure 2 : Error evolution of $e^{(k)}(\bm{u})$ , $\dot{e}^{(k)}(\bm{u})$ , and $\bar{e}^{(k)}(\bm{u})$ generated by gradient descent applied to $f(\bm{x},u)\coloneqq\|A\bm{x}-\bm{b}\|^{2}/2+u\|\bm{x}\|^{2}/2$ . The dashed lines denote the bounds given in ( 8 ) and ( 19 ). The vertical lines denote
Figure 2 : Error evolution of $e^{(k)}(\bm{u})$ , $\dot{e}^{(k)}(\bm{u})$ , and $\bar{e}^{(k)}(\bm{u})$ generated by gradient descent applied to $f(\bm{x},u)\coloneqq\|A\bm{x}-\bm{b}\|^{2}/2+u\|\bm{x}\|^{2}/2$ . The dashed lines denote the bounds given in ( 8 ) and ( 19 ). The vertical lines denote

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。