[Paper Review] Stochastic Variance Reduction for Nonconvex Optimization
This paper analyzes nonconvex finite-sum optimization with SVRG, proving nonasymptotic convergence to stationary points faster than SGD and gradient descent, and showing linear convergence for gradient-dominated subclasses.
We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to mini-batching in parallel settings.
Motivation & Objective
- Motivate and analyze stochastic variance reduced gradient (VR) methods for nonconvex finite-sum problems.
- Establish nonasymptotic convergence rates of SVRG to stationary points that improve over SGD and gradient descent.
- Identify conditions under which SVRG achieves linear convergence for a subclass of nonconvex problems (gradient-dominated).
- Investigate mini-batch SVRG and prove linear speedups in parallel settings.
- Provide comparisons and insights across SGD, gradient descent, and SVRG in both nonconvex and convex scenarios.
Proposed method
- Study optimization of f(x) = (1/n) sum_{i=1}^n f_i(x) with Lipschitz-smooth components under the Incremental First-order Oracle (IFO) model.
- Analyze nonconvex SVRG (Algorithm 2) operating in epochs with a full gradient computed at a reference point and inner stochastic updates.
- Derive convergence guarantees: E[||∇f(x_a)||^2] ≤ (f(x^0) - f(x^*))/(T γ_n) under suitable parameter choices.
- Introduce parameter choices for step sizes and epoch lengths to obtain explicit IFO complexities.
- Extend analysis to mini-batch SVRG (Algorithm 4) showing variance reduction and parallelism benefits.
- Present a variant (Msvrg) with step-size balancing between SGD-like and GD-like behavior to improve IFO complexity.
Experimental results
Research questions
- RQ1Can SVRG achieve faster nonconvex convergence rates than SGD and gradient descent for finite-sum problems?
- RQ2Under what parameter regimes does SVRG attain provable linear convergence for gradient-dominated nonconvex functions?
- RQ3How does mini-batching affect SVRG's convergence and can it provide linear speedups in parallel settings?
- RQ4How does SVRG compare to SGD and gradient descent in IFO complexity across nonconvex and convex regimes?
Key findings
- SVRG has faster convergence to stationary points than SGD and GradientDescent for nonconvex finite-sum problems, with improvements by up to a factor related to n^{1/3} in certain regimes.
- For nonconvex SVRG, the IFO complexity to reach ε-accuracy scales as O(n + n^{1/3}/ε) or O(n + n^{α}/ε) depending on α, with optimal dependence at α = 2/3.
- For gradient-dominated (τ-gradient dominated) nonconvex functions, SVRG achieves global linear convergence, and IFO complexity scales as O((n + τ n^{2/3}) log(1/ε)).
- Mini-batching with SVRG yields linear speedups in parallel settings for batch size b < n^{2/3}, without increasing total IFO calls, yielding O(n + n^{2/3}/ε) complexity in the IFO model.
- A variant Msvrg combines favorable step-size choices to attain better IFO complexity than both SGD and GradientDescent under certain assumptions (σ-bounded gradients).
- The paper also provides a convex-case analysis showing SVRG achieves O(1/ε) rate in IFO complexity and can reach improved rates with tailored parameter choices.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.