QUICK REVIEW

[论文解读] A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Zhize Li, Jian Li|arXiv (Cornell University)|Feb 13, 2018

Sparse and Compressive Sensing Techniques参考文献 25被引用 26

一句话总结

本文提出 ProxSVRG+，一种针对非光滑非凸有限和问题的新型近端随机梯度方法，结合方差缩减与高效的近端更新。其在常数或中等规模小批量下实现了更优的收敛速率，优于 ProxGD 和 ProxSVRG，且在 Polyak-Łojasiewicz 条件下无需重启即可实现全局线性收敛。

ABSTRACT

We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., 2016b]. Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG [Reddi et al., 2016b]. Moreover, for nonconvex functions satisfied Polyak-Łojasiewicz condition, we prove that ProxSVRG+ achieves a global linear convergence rate without restart unlike ProxSVRG. Thus, it can \emph{automatically} switch to the faster linear convergence in some regions as long as the objective function satisfies the PL condition locally in these regions. ProxSVRG+ also improves ProxGD and ProxSVRG/SAGA, and generalizes the results of SCSG in this case. Finally, we conduct several experiments and the experimental results are consistent with the theoretical results.

研究动机与目标

解决在常数或中等规模小批量下，针对非光滑非凸有限和问题缺乏高效随机方法的问题。
克服 ProxSVRG 和 ProxSAGA 的局限性，后者需使用大批次才能优于确定性 ProxGD。
开发一种能减少近端 oracle 调用次数，同时保持或提升收敛速率的方法。
解决 Reddi 等人（2016b）提出的开放问题：在常数小批量下实现优于 ProxGD 的性能。
在无需重启的情况下，建立在 Polyak-Łojasiewicz 条件下的全局线性收敛性。

提出的方法

提出 ProxSVRG+，一种基于 SVRG 框架的方差缩减近端随机梯度方法。
引入一种新颖的分析技术，相比 ProxSVRG 简化了收敛性证明，并实现了更紧的界。
采用步长规则 $\eta = \frac{1}{6L}$，以平衡下降效果与方差缩减效果。
结合完整梯度与随机梯度估计，以降低更新方向中的方差。
应用 Young 不等式与范数分解，推导出期望目标差距的递归界。
利用 Polyak-Łojasiewicz（PL）条件，实现无需重启的全局线性收敛。

实验结果

研究问题

RQ1在非光滑非凸优化中，是否能通过随机近端方法在常数或中等规模小批量下实现优于确定性 ProxGD 的收敛性？
RQ2所提出的 ProxSVRG+ 方法是否能在 PL 条件下实现无需重启的全局线性收敛？
RQ3与 ProxSVRG 相比，能否显著减少近端 oracle 调用次数，同时保持或提升收敛速率？
RQ4ProxSVRG+ 在光滑非凸情形下与 SCSG 的表现如何比较？能否将 SCSG 的结果推广至非光滑情形？
RQ5在非光滑非凸设置下，随机梯度 oracle 调用与近端 oracle 调用之间是否存在最优权衡？

主要发现

ProxSVRG+ 在随机一阶 oracle 调用方面实现了 $\widetilde{O}(\frac{1}{\epsilon^{3/2}} \wedge \frac{n^{1/2}}{\epsilon})$ 的收敛速率，优于先前结果。
该方法相比 ProxSVRG 减少了近端 oracle 调用次数，使其在实际应用中更具效率。
对于满足 Polyak-Łojasiewicz 条件的函数，ProxSVRG+ 实现了无需重启的全局线性收敛，而 ProxSVRG 无法做到。
ProxSVRG+ 在广泛的小批量范围内均优于 ProxGD，解决了 Reddi 等人（2016b）提出的开放问题。
该算法将 SCSG 的最佳已知结果推广至非光滑非凸情形，扩展了其适用范围。
实验结果验证了理论发现，显示出相对于 ProxGD 和 ProxSVRG 的一致性能提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。