QUICK REVIEW

[Paper Review] Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

Martijn de Vos, Sadegh Farhadkhani|arXiv (Cornell University)|Oct 3, 2023

Advanced MIMO Systems Optimization10 citations

TL;DR

Epidemic Learning (EL) is a decentralized learning algorithm where each node randomly communicates with s peers per round, leading to faster convergence and better accuracy under non-IID data than static topologies. Empirically, EL achieves up to 1.7x fewer communication rounds and up to 2.2% higher accuracy on CIFAR-10 in a 96-node network.

ABSTRACT

We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound $O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.7 imes$ quicker than baseline DL algorithms and attains $2.2 $\% higher accuracy for the same communication volume.

Motivation & Objective

Motivate faster convergence in decentralized learning despite non-IID data distributions across nodes.
Propose a randomized, epidemic-style communication protocol where each node contacts s random peers each round.
Analyze convergence theoretically under smooth non-convex losses and heterogeneous data.
Provide empirical evidence comparing EL against static and randomized baselines on standard datasets.

Proposed method

Define EL with two variants: EL-Oracle (s-regular, coordinated) and EL-Local (s independent, decentralized).
In each round, nodes perform a local SGD step, then send their updated models to s randomly chosen peers.
Nodes update their model by averaging the local update with the received peer updates.
Prove convergence rate for EL: O(1/sqrt[3](sT^2)) improvement in transient iterations and linear speedup in T under standard assumptions.
Show that EL-Oracle yields doubly stochastic mixing with tighter analysis than general D-PSGD analyses.
Empirically evaluate EL on CIFAR-10 with 96 nodes and non-IID Dirichlet data split, comparing to static topologies and EquiTopo.

Experimental results

Research questions

RQ1Does randomized, changing communication topology improve convergence speed over static and other dynamic topologies in decentralized learning?
RQ2How do EL-Oracle and EL-Local compare in convergence guarantees and practical performance under non-IID data distributions?
RQ3What is the impact of the sample size s on convergence, communication cost, and final accuracy?
RQ4To what extent does EL achieve linear speedup and reduce transient iterations compared to prior bounds?

Key findings

EL converges with a rate of O(1/sqrt(nT) + 1/sqrt[3](sT^2) + 1/T) for EL-Oracle and O(1/sqrt(nT) for the first term in EL-Local, plus problem-dependent second terms.
Transient iterations for EL are O(n^3/s^2), improving over the prior O(n^3) bound by a factor of s^2.
When s grows (e.g., s in O(log n)), EL achieves significantly fewer transient rounds and improved convergence compared to static/dynamic baselines.
Empirically, on CIFAR-10 with 96 nodes and non-IID data, EL-Oracle and EL-Local converge faster and reach up to 2.2% higher top-1 accuracy than the best static baseline, with up to 1.7x fewer communication rounds.
EL-Local maintains comparable performance to EL-Oracle without central coordination, demonstrating practical decentralized implementability.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.