QUICK REVIEW

[Paper Review] A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates

Kaiwen Zhou, Fanhua Shang|arXiv (Cornell University)|Jun 28, 2018

Stochastic Gradient Optimization Techniques12 references44 citations

TL;DR

MiG is a simple stochastic variance reduced gradient method that matches the best-known convergence rates, with efficient sparse and asynchronous variants, achieving (n+√(κn)) log(1/ε) for strongly convex problems and 1/T^2 for non-strongly convex problems.

ABSTRACT

Recent years have witnessed exciting progress in the study of stochastic variance reduced gradient methods (e.g., SVRG, SAGA), their accelerated variants (e.g, Katyusha) and their extensions in many different settings (e.g., online, sparse, asynchronous, distributed). Among them, accelerated methods enjoy improved convergence rates but have complex coupling structures, which makes them hard to be extended to more settings (e.g., sparse and asynchronous) due to the existence of perturbation. In this paper, we introduce a simple stochastic variance reduced algorithm (MiG), which enjoys the best-known convergence rates for both strongly convex and non-strongly convex problems. Moreover, we also present its efficient sparse and asynchronous variants, and theoretically analyze its convergence rates in these settings. Finally, extensive experiments for various machine learning problems such as logistic regression are given to illustrate the practical improvement in both serial and asynchronous settings.

Motivation & Objective

Motivate acceleration in stochastic variance reduced gradient methods for finite-sum convex optimization.
Design a simple algorithm (MiG) that tracks only one variable vector in the inner loop.
Achieve best-known oracle complexities for strongly convex problems and optimal rates for non-strongly convex problems.
Extend MiG to sparse and asynchronous settings with practical performance benefits.
Provide empirical evidence showing efficiency in serial and asynchronous scenarios.

Proposed method

Introduce MiG with a single inner-loop variable to reduce overhead and enable easy extension to sparse/async settings.
Use a gradient estimator tilde{∇} = ∇f_i_j(y_{j-1}) − ∇f_i_j(tilde{x}_{s-1}) + μ_s where μ_s = ∇f(tilde{x}_{s-1}).
Compute y as a theta-weighted combination of x and tilde{x}, i.e., y_{j-1} = θ x^{s}_{j-1} + (1−θ) tilde{x}_{s-1}.
Update x^{s}_{j} via proximal step min_x { (1/2η)||x−x^{s}_{j-1}||^2 + ⟨tilde{∇}, x⟩ + g(x) }.
Aggregate iterates to form tilde{x}_s via a θ-weighted average of inner iterates.
Provide sparse and asynchronous variants using diagonal reweighting D to keep unbiased gradient estimates and maintain a one-vector update structure.

Experimental results

Research questions

RQ1Can acceleration in stochastic variance reduced methods be achieved while keeping only one coupled vector to update?
RQ2What are the oracle complexities MiG can achieve for strongly convex and non-strongly convex problems compared to existing methods?
RQ3How can MiG be extended to sparse and asynchronous settings without losing convergence guarantees?
RQ4How does MiG perform empirically against state-of-the-art methods in dense, sparse, and asynchronous regimes?

Key findings

MiG attains the best-known oracle complexity for strongly convex problems: O((n+√(κn)) log(1/ε)).
For non-strongly convex problems, MiG NSC achieves the optimal O(1/T^2) rate.
MiG remains single-vector in the inner loop, enabling efficient sparse and asynchronous variants with practical performance gains.
In experiments, MiG matches or outperforms Katyusha and SVRG in dense settings and outperforms KroMagnon and ASAGA in sparse/async settings on relevant datasets.
MiG does not require a gradient table, simplifying implementation and extending easily to distributed or asynchronous contexts.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.