QUICK REVIEW

[Paper Review] An Asynchronous Parallel Stochastic Coordinate Descent Algorithm

Ji Liu, Steve Wright|arXiv (Cornell University)|Nov 8, 2013

Stochastic Gradient Optimization Techniques35 references147 citations

TL;DR

This paper proposes an asynchronous parallel stochastic coordinate descent (AsySCD) algorithm for convex optimization that achieves linear convergence under an essential strong convexity condition and sublinear $1/K$ convergence for general convex functions. The method enables near-linear speedup on multicore systems when the number of processors is bounded by $O(n^{1/2})$ in unconstrained and $O(n^{1/4})$ in separable-constrained settings, leveraging asynchronous updates with bounded delay.

ABSTRACT

We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate ($1/K$) on general convex functions. Near-linear speedup on a multicore system can be expected if the number of processors is $O(n^{1/2})$ in unconstrained optimization and $O(n^{1/4})$ in the separable-constrained case, where $n$ is the number of variables. We describe results from implementation on 40-core processors.

Motivation & Objective

To design a scalable, asynchronous parallel optimization algorithm for large-scale convex problems arising in machine learning and data analysis.
To establish convergence guarantees—linear under essential strong convexity, sublinear for general convex functions—under asynchronous updates with bounded delay.
To derive theoretical conditions for achieving near-linear speedup in terms of problem dimension $n$ and delay parameter $ au $.
To validate the algorithm's performance empirically on 40-core systems, demonstrating practical scalability and efficiency.

Proposed method

The algorithm performs stochastic coordinate descent by selecting a random coordinate $i$ and updating $x_i$ using a constant stepsize multiple of the $i$-th partial gradient $\nabla_i f(x)$.
Updates are performed asynchronously across multiple cores without synchronization, with a bounded delay $\tau$ on the age of gradient information used.
For separable constraints, updates are projected back into the feasible set $\Omega_i$ to maintain feasibility.
The convergence analysis relies on an essential strong convexity condition (3), which is weaker than standard strong convexity and allows for non-singleton solution sets.
Key theoretical bounds involve the restricted Lipschitz constant $L_{\text{res}}$, coordinate Lipschitz constants $L_i$, and the maximum Lipschitz constant $L_{\max}$.
A Lyapunov function is constructed to analyze convergence, combining the distance to the optimal set and the objective gap, leading to a contraction inequality that establishes the convergence rate.

Experimental results

Research questions

RQ1Can an asynchronous stochastic coordinate descent method achieve linear convergence under a weaker convexity condition than standard strong convexity?
RQ2What is the maximum number of processors that can be used before speedup diminishes, and how does this depend on problem dimension $n$?
RQ3How does bounded delay $\tau$ affect convergence rate and parallel efficiency in asynchronous coordinate descent?
RQ4Can the algorithm achieve near-linear speedup in practice on modern multicore architectures?
RQ5What is the role of coordinate-wise Lipschitz constants and Hessian structure in enabling high parallelism?

Key findings

The algorithm achieves linear convergence rate $O((1 - \frac{l}{n(l + \gamma^{-1}L_{\max})})^K)$ under the essential strong convexity condition, where $l$ is the strong convexity parameter.
For general convex functions, the convergence rate is sublinear at $O(1/K)$, matching known bounds for serial stochastic methods.
Near-linear speedup is achievable when the number of processors is $O(n^{1/2})$ in the unconstrained case and $O(n^{1/4})$ in the separable-constrained case.
Empirical results on 40-core systems confirm the theoretical speedup trends and demonstrate robust performance under high asynchrony.
The algorithm remains effective even when the Hessian is nearly diagonal, indicating high tolerance for coordinate-wise interactions.
The stepsize $\gamma = 1/2$ satisfies the theoretical conditions for convergence, and the analysis shows boundedness of the Lyapunov function under this choice.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.