[论文解读] ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization
ProxSARAH 引入带平均步骤的原型方差-reduction 框架,使用 SARAH 估计器,在有限和期望设置下实现对随机组合非凸问题的已知最佳复杂度(常数步长和自适应步长均适用)。
We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al, 2017) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The algorithms only require an average smoothness assumption of the nonconvex objective term and additional bounded variance assumption if applied to expectation problems. They work with both constant and adaptive step-sizes, while allowing single sample and mini-batches. In all these cases, we prove that our algorithms can achieve the best-known complexity bounds. One key step of our methods is new constant and adaptive step-sizes that help to achieve desired complexity bounds while improving practical performance. Our constant step-size is much larger than existing methods including proximal SVRG schemes in the single sample case. We also specify the algorithm to the non-composite case that covers existing state-of-the-arts in terms of complexity bounds. Our update also allows one to trade-off between step-sizes and mini-batch sizes to improve performance. We test the proposed algorithms on two composite nonconvex problems and neural networks using several well-known datasets.
研究动机与目标
- Motivate and solve stochastic composite nonconvex optimization problems that include finite-sum and expectation settings.
- Develop a proximal variance-reduction framework leveraging the SARAH estimator to improve convergence guarantees.
- Design constant and adaptive step-size rules within an averaging proximal-gradient scheme to achieve strong theoretical rates and practical performance.
- Extend the framework to both composite and non-composite cases and analyze the trade-offs between step-sizes and mini-batch sizes.
提出的方法
- Use SARAH-based gradient estimates within a double-loop (outer/inner) scheme.
- Incorporate a proximal-gradient step followed by an averaging step to form update G_eta.
- Introduce two step-sizes: averaging step-size gamma and proximal-gradient step-size eta, with product hat_eta for overall progress.
- Allow single-sample and mini-batch variants and support both finite-sum and expectation problems.
- Prove complexity bounds matching best-known rates: O(n + n^{1/2} epsilon^{-2}) for finite-sum and O(sigma^2 epsilon^{-2} + sigma epsilon^{-3}) for expectation.
- Show adaptability via adaptive step-size rules and discuss trade-offs between epoch length m and batch size b_hat.
实验结果
研究问题
- RQ1Can the SARAH-based proximal framework achieve optimal or near-optimal convergence rates for composite nonconvex objectives in both finite-sum and expectation settings?
- RQ2How do constant versus adaptive step-sizes, and single-sample versus mini-batch regimes, affect theoretical guarantees and practical performance?
- RQ3What is the impact of the averaging step on convergence and complexity in proximal nonconvex optimization?
- RQ4How can the epoch length and batch sizes be chosen to balance computational cost and convergence rate while preserving guarantees?
主要发现
- In the finite-sum setting, ProxSARAH achieves a complexity of O(n + n^{1/2} epsilon^{-2}) to obtain an epsilon-stationary point in expectation, matching lower-bound results up to constants for suitable n.
- In the expectation setting, ProxSARAH requires O(sigma^{2} epsilon^{-2} + sigma epsilon^{-3}) first-order oracle calls under a bounded-variance assumption, achieving the best-known rate among comparable methods.
- The framework uses two step-sizes and an averaging step, allowing larger constant proximal-step sizes compared to proximal SVRG-type methods and flexible trade-offs with mini-batch sizes.
- Adaptive step-size variants are provided, often outperforming constant-step-size schemes in practice, and they extend to non-composite problems as well as composite ones.
- The method covers both composite and non-composite cases and extends to single-sample and mini-batch regimes, while maintaining the same proximal operator usage as in ProxSVRG/ProxSVRG+ and achieving competitive complexity bounds.
- Compared to proximal SVRG, SPIDER, and SpiderBoost, ProxSARAH achieves similar or better complexity with larger effective step-sizes in the composite setting and supports a wider range of mini-batch configurations.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。