QUICK REVIEW

[论文解读] Step-Size Stability in Stochastic Optimization: A Theoretical Perspective

Fabian Schaipp, Robert M. Gower|arXiv (Cornell University)|Feb 10, 2026

Stochastic Gradient Optimization Techniques被引用 0

一句话总结

该论文提出一个理论框架来衡量随机优化方法在大步长下的劣化程度，表明像 SPS 和 NGN 这样的自适应方法比 SGD 更稳定，实验与理论在非凸情形下亦相符。

ABSTRACT

We present a theoretical analysis of stochastic optimization methods in terms of their sensitivity with respect to the step size. We identify a key quantity that, for each method, describes how the performance degrades as the step size becomes too large. For convex problems, we show that this quantity directly impacts the suboptimality bound of the method. Most importantly, our analysis provides direct theoretical evidence that adaptive step-size methods, such as SPS or NGN, are more robust than SGD. This allows us to quantify the advantage of these adaptive methods beyond empirical evaluation. Finally, we show through experiments that our theoretical bound qualitatively mirrors the actual performance as a function of the step size, even for nonconvex problems.

研究动机与目标

引入稳定性指数 delta_t，量化跨随机方法步长对亚最优性的放大关系。
在基于模型的凸设置中推导 SGD、SPS、NGN、SPP 的 delta_t。
证明 SPS、NGN、SPP 的稳定性指数不劣于 SGD，且通常随步长大小的扩展具有更有利的尺度。
给出非渐近界，连接步长、稳定性和平均/最后一轮的亚最优性。
通过实验演示理论稳定性界在凸任务和非凸任务中的定性反映，接近实际性能。

提出的方法

基于模型的随机近端点框架，更新公式为 x_{t+1}=argmin_y f_x(y,s_t) + (1/(2 alpha_t))||y-x_t||^2。
定义稳定性指数 delta_t = f(x_t,s_t) - f_{x_t}(x_{t+1},s_t) - (1/(2 alpha_t))||x_{t+1}-x_t||^2。
通过计算 delta_t 来分析四种方法：SGD、SPS、NGN、SPP，并将 delta_t 与收敛界相关联。
给出基于凸性的（A1）-（A2）假设，以推导平均值和最后迭代的非渐近界（定理3和定理4）。
将模型专门化为线性（SGD）、截断（SPS）、平方根（NGN）和精确（SPP）形式，以获得显式的 delta_t 表达式（例如 delta_t^SGD = (alpha_t/2)||g_t||^2；delta_t^SPS = tau_t[1 - tau_t/(2 alpha_t)]||g_t||^2）。
将 NGN 和 SPP 的分析扩展至讨论实验中观测到的非凸问题的稳定性。

Step-Size Stability in Stochastic Optimization: A Theoretical Perspective

实验结果

研究问题

RQ1当步长增大时，随机优化方法的亚最优性如何退化？
RQ2对于 SGD、SPS、NGN、SPP，稳定性指数 delta_t 如何随 alpha_t 的变化而扩展？
RQ3自适应步长方法如 SPS 和 NGN 在凸与非凸情境下是否比 SGD 更稳定？
RQ4理论稳定性界在回归和深度学习等任务上的经验表现有多一致？

主要发现

一个关键的稳定性指数 delta_t 确定了跨方法步长对亚最优性的放大关系。
SPS、NGN、SPP 的稳定性指数不随 alpha_t 线性增长，与 SGD 不同。
NGN 和 SPS 可以在任意 alpha 下证明至少与 SGD 同样稳定，且 NGN 的增长在 alpha_t 增大时呈现子线性。
SPP 的 delta_t 受限于 min{(alpha_t/2)||g_t||^2, f(x_t,s_t) - inf_y f(y,s_t)}，且从不劣于 SGD。
平均迭代和最后迭代的理论界与非凸实验（如 CIFAR-10 上的 ResNet）及凸任务（包括线性回归和分类）的观测性能高度一致。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。