[论文解读] Linear Convergence of Variance-Reduced Stochastic Gradient without Strong Convexity
本文为非强凸问题上的方差缩减随机梯度方法——特别是 VRPSG 和 Prox-SVRG——建立了线性收敛性,这类问题在机器学习中非常常见。其关键技术贡献是一个新颖的半强凸(Semi-Strongly Convex, SSC)不等式,该不等式使得在不假设强凸性的情况下也能实现线性收敛,适用于约束和正则化设置。
Stochastic gradient algorithms estimate the gradient based on only one or a few samples and enjoy low computational cost per iteration. They have been widely used in large-scale optimization problems. However, stochastic gradient algorithms are usually slow to converge and achieve sub-linear convergence rates, due to the inherent variance in the gradient computation. To accelerate the convergence, some variance-reduced stochastic gradient algorithms, e.g., proximal stochastic variance-reduced gradient (Prox-SVRG) algorithm, have recently been proposed to solve strongly convex problems. Under the strongly convex condition, these variance-reduced stochastic gradient algorithms achieve a linear convergence rate. However, many machine learning problems are convex but not strongly convex. In this paper, we introduce Prox-SVRG and its projected variant called Variance-Reduced Projected Stochastic Gradient (VRPSG) to solve a class of non-strongly convex optimization problems widely used in machine learning. As the main technical contribution of this paper, we show that both VRPSG and Prox-SVRG achieve a linear convergence rate without strong convexity. A key ingredient in our proof is a Semi-Strongly Convex (SSC) inequality which is the first to be rigorously proved for a class of non-strongly convex problems in both constrained and regularized settings. Moreover, the SSC inequality is independent of algorithms and may be applied to analyze other stochastic gradient algorithms besides VRPSG and Prox-SVRG, which may be of independent interest. To the best of our knowledge, this is the first work that establishes the linear convergence rate for the variance-reduced stochastic gradient algorithms on solving both constrained and regularized problems without strong convexity.
研究动机与目标
- 解决方差缩减随机梯度方法在非强凸问题上收敛性保证的空白。
- 在约束和正则化优化设置下,为 VRPSG 和 Prox-SVRG 建立线性收敛性。
- 提出并严格证明一种新的半强凸(SSC)不等式,该不等式适用于非强凸问题。
- 证明该 SSC 不等式具有算法无关性,可能适用于其他随机梯度方法。
- 为实际机器学习问题(如最小二乘和逻辑回归)中线性收敛性提供理论依据,这些问题是通常不具备强凸性的。
提出的方法
- 提出方差缩减投影随机梯度(VRPSG)和 Prox-SVRG 算法,用于处理非强凸问题。
- 引入一种新的半强凸(SSC)不等式,即使在不假设强凸性的情况下,也能以目标函数差距为上界来控制到最优解集的距离。
- 利用 SSC 不等式推导递归误差界,从而在较弱条件下建立线性收敛性。
- 将 SSC 不等式应用于约束问题(通过投影)和正则化问题(通过近端步骤)。
- 通过分析每次迭代中目标函数差距的期望下降,结合步长和内层循环参数,推导收敛速率。
- 采用非均匀采样(按利普希茨常数成比例)以提高收敛效率,并通过实验验证其有效性。
实验结果
研究问题
- RQ1方差缩减随机梯度方法是否可以在不假设强凸性的情况下实现线性收敛?
- RQ2何种新型结构条件使得在非强凸设置下能够实现线性收敛?
- RQ3半强凸(SSC)不等式是否在约束和正则化优化问题中均成立且可被严格证明有效?
- RQ4在实际中,VRPSG 的性能如何依赖于采样策略、内层循环长度和步长?
- RQ5SSC 不等式是否可被用于分析 VRPSG 和 Prox-SVRG 之外的其他随机梯度算法?
主要发现
- VRPSG 和 Prox-SVRG 在非强凸问题上实现了线性收敛速率,这一结果此前仅在强凸假设下成立。
- SSC 不等式在约束和正则化设置下,对一类非强凸问题被严格证明成立。
- SSC 不等式提供了以目标函数差距表示到最优解集距离的上界,从而支持线性收敛性分析。
- 实验结果表明,非均匀采样(按利普希茨常数成比例)相比均匀采样能显著提升收敛速度。
- VRPSG 算法对步长选择具有鲁棒性,即使理论边界要求 $η < 0.25/L_P$,采用 $η = 1/L_P$ 和 $η = 5/L_P$ 均能实现快速收敛。
- 将内层循环长度 $m$ 设为 $0.5n$ 或 $n$ 时性能最稳定,表明中间值为最优选择。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。