QUICK REVIEW

[论文解读] SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

Zhe Wang, Kaiyi Ji|arXiv (Cornell University)|Oct 25, 2018

Stochastic Gradient Optimization Techniques参考文献 38被引用 30

一句话总结

本文提出SpiderBoost，一种新颖的随机方差缩减算法，采用更大的常数步长，在非凸优化中实现了近似最优的Oracle复杂度。此外，本文还引入了带动量的Prox-SpiderBoost-M，其在复合非凸问题中实现了最优的$\mathcal{O}(n + \sqrt{n}\epsilon^{-2})$复杂度，显著提升了SPIDER及其他先前方法的实际性能。

ABSTRACT

SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization. However, SPIDER uses an accuracy-dependent stepsize that slows down the convergence in practice, and cannot handle objective functions that involve nonsmooth regularizers. In this paper, we propose SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee. In particular, we show that proximal SpiderBoost achieves an oracle complexity of $\mathcal{O}(\min\{n^{1/2}ε^{-2},ε^{-3}\})$ in composite nonconvex optimization, improving the state-of-the-art result by a factor of $\mathcal{O}(\min\{n^{1/6},ε^{-1/3}\})$. We further develop a novel momentum scheme to accelerate SpiderBoost for composite optimization, which achieves the near-optimal oracle complexity in theory and substantial improvement in experiments.

研究动机与目标

为解决SPIDER的步长依赖于精度的问题，该问题限制了实际收敛速度。
开发SPIDER的近端扩展，以处理具有可证明收敛保证的非光滑正则项。
通过一种新颖的动量方案加速基于SPIDER的方法，同时保持最优的Oracle复杂度。
弥合方差缩减随机优化中理论复杂度与实际性能之间的差距。

提出的方法

SpiderBoost采用一种新的收敛性分析框架，通过限制整个内层循环中变量增量的上界，使得步长$\eta = \mathcal{O}(1/L)$成为可能，而非SPIDER中的$\mathcal{O}({\epsilon}/{L})$。
该算法采用类似于SPIDER的归一化梯度估计器，但通过更紧致的分析放松了对步长的约束。
Prox-SpiderBoost通过引入近端映射，将SpiderBoost扩展至处理带有非光滑正则项的复合非凸问题。
设计了一种新颖的动量方案Prox-SpiderBoost-M，以加速收敛，同时利用梯度估计器的鞅结构。
该方法采用递归更新规则，通过自适应权重$\alpha_k$、$\beta_k$和$\lambda_k$控制动量与方差。
理论分析通过沿优化路径的望远镜求和与方差分解，界定了梯度估计器的期望范数。

实验结果

研究问题

RQ1能否设计一种方差缩减算法，在使用更大常数步长的同时，保持近似最优的Oracle复杂度？
RQ2能否将SPIDER推广至带有非光滑正则项的复合优化问题，同时保持收敛性保证？
RQ3能否有效将动量机制集成到SPIDER类算法中，以提升实际性能，而不牺牲理论最优性？
RQ4在给定假设下，复合非凸优化的最优Oracle复杂度是多少？

主要发现

Prox-SpiderBoost实现了$\mathcal{O}(\min\{n^{1/2}\epsilon^{-2}, \epsilon^{-3}\})$的Oracle复杂度，相比现有最优方法提升了$\mathcal{O}(\min\{n^{1/6}, \epsilon^{-1/3}\})$的因子。
SpiderBoost中常数步长$\eta = \mathcal{O}(1/L)$相比SPIDER的$\mathcal{O}(\epsilon/L)$步长，在实际中实现了更快的收敛速度。
Prox-SpiderBoost-M实现了最优的$\mathcal{O}(n + \sqrt{n}\epsilon^{-2})$Oracle复杂度，与非凸优化的已知下界一致。
理论分析表明$\mathbb{E}\|G_{\lambda_\zeta}(z_\zeta, \nabla f(z_\zeta))\| \leq \mathcal{O}(\sqrt{L(\Psi(x_0) - \Psi^*)/K})$，意味着需要$K = \mathcal{O}(L(\Psi(x_0) - \Psi^*)/\epsilon^2)$轮迭代。
该方法实现了$\mathcal{O}(\epsilon^{-2})$的近端Oracle复杂度，这是该问题类别的最优结果。
实验结果表明，与SPIDER及其他基线方法相比，该方法在早期训练阶段表现出显著的性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。