[Paper Review] Multi-consensus Decentralized Accelerated Gradient Descent
The paper introduces two decentralized accelerated proximal gradient methods, ProxMudag and Mudag, achieving (near) optimal computation and communication complexities that depend on the global condition number, while allowing non-convex local functions.
This paper considers the decentralized convex optimization problem, which has a wide range of applications in large-scale machine learning, sensor networks, and control theory. We propose novel algorithms that achieve optimal computation complexity and near optimal communication complexity. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower bound depending on the global condition number instead of the local one. Furthermore, the linear convergence of our algorithms only depends on the strong convexity of global objective and it does \emph{not} require the local functions to be convex. The design of our methods relies on a novel integration of well-known techniques including Nesterov's acceleration, multi-consensus and gradient-tracking. Empirical studies show the outperformance of our methods for machine learning applications.
Motivation & Objective
- Address the decentralized convex optimization problem where the global objective combines local functions across agents.
- Develop algorithms that achieve (near) optimal communication complexity depending on the global condition number rather than the local one.
- Relax convexity requirements on local functions while preserving linear convergence under global strong convexity.
- Incorporate multi-consensus, gradient-tracking, and Nesterov acceleration to approximate centralized accelerated gradient methods.
- Provide empirical evidence of outperformance on machine learning tasks.
Proposed method
- Propose ProxMudag for composite objectives with a non-differentiable regularizer r(x); combines proximal updates, multi-consensus, and gradient tracking with acceleration.
- Propose Mudag for smooth objectives (r(x)=0); uses a two-step multi-consensus and gradient-tracking to emulate centralized accelerated gradient descent.
- Use FastMix as an efficient averaging operator to implement multi-consensus and ensure consensus among agents.
- Analyze convergence via a Lyapunov function and show averaged sequences follow proximal-accelerated gradient dynamics.
- Derive computation complexity O(sqrt(kappa_g) log(1/epsilon)) and near-optimal communication complexity O(sqrt(kappa_g/(1-lambda2(W))) log(M kappa_g / L) log(1/epsilon)).
- Provide iteration and communication bounds that depend on the global condition number kappa_g = L/μ rather than the local condition number.
Experimental results
Research questions
- RQ1Can decentralized optimization achieve near-optimal communication complexity that depends on the global condition number κ_g instead of the local κ_ℓ?
- RQ2Is it possible to attain linear convergence for decentralized composite objectives without requiring each local function f_i to be convex?
- RQ3Do accelerated proximal gradient strategies with multi-consensus and gradient-tracking suffice to approximate centralized accelerated methods in a decentralized setting?
- RQ4What are the computation and communication complexity trade-offs for two proposed algorithms, ProxMudag and Mudag, under smoothness and strong convexity assumptions?
Key findings
- Mudag achieves computation complexity O(√κ_g log(1/ε)) and near-optimal communication complexity O(√(κ_g/(1−λ2(W))) log(Mκ_g/L) log(1/ε)).
- ProxMudag achieves optimal computation and near-optimal communication complexity for convex but non-differentiable r(x).
- The algorithms depend on the global condition number κ_g, not the local one, addressing an open question in decentralized optimization.
- The methods allow f_i to be non-convex while requiring global f to be μ-strongly convex and L-smooth, broadening applicability.
- FastMix and gradient-tracking are integrated to ensure averaging and gradient estimates approximate centralized accelerations.
- Empirical results show the proposed methods outperform existing decentralized methods on machine learning tasks.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.