QUICK REVIEW

[Paper Review] Revisiting Distributed Synchronous SGD

Pan, Xinghao, Jianmin Chen|arXiv (Cornell University)|Feb 19, 2017

Distributed and Parallel Computing Systems27 references609 citations

TL;DR

The paper challenges the view that synchronous SGD is impractical and shows that synchronous optimization with backup workers can avoid asynchronous noise and mitigate stragglers, yielding faster convergence and better test accuracy.

ABSTRACT

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In contrast, the synchronous approach is often thought to be impractical due to idle time wasted on waiting for straggling workers. We revisit these conventional beliefs in this paper, and examine the weaknesses of both approaches. We demonstrate that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers. Our approach is empirically validated and shown to converge faster and to better test accuracies.

Motivation & Objective

Reassess the practicality of synchronous SGD in distributed training.
Identify weaknesses of asynchronous and synchronous approaches.
Propose a backup-wroker synchronous optimization approach to reduce idle time and straggler impact.
Demonstrate empirical convergence speed and accuracy benefits of the proposed method.

Proposed method

Introduce synchronous optimization with backup workers as an alternative to pure asynchronous and standard synchronous schemes.
Analyze how backup workers reduce idle time and mitigate stragglers without introducing excessive noise.
Provide empirical validation showing faster convergence and improved test accuracies.

Experimental results

Research questions

RQ1Can synchronous SGD be made practical in distributed settings by using backup workers?
RQ2How does backup-worker synchronous optimization compare to asynchronous SGD in terms of convergence and test accuracy?
RQ3What are the trade-offs between idle time, stragglers, and optimization noise in these schemes?
RQ4Does the proposed method converge faster across representative deep learning training scenarios?

Key findings

Synchronous optimization with backup workers can avoid asynchronous noise.
The backup-wroker approach mitigates the impact of stragglers.
The method converges faster in practice.
The method yields better test accuracies in empirical validation.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.