[Paper Review] Faster On-Device Training Using New Federated Momentum Algorithm
The paper proves convergence of FedAvg for non-convex problems and introduces FedMom, an accelerated federated momentum method with convergence guarantees, showing faster convergence in simulations.
Mobile crowdsensing has gained significant attention in recent years and has become a critical paradigm for emerging Internet of Things applications. The sensing devices continuously generate a significant quantity of data, which provide tremendous opportunities to develop innovative intelligent applications. To utilize these data to train machine learning models while not compromising user privacy, federated learning has become a promising solution. However, there is little understanding of whether federated learning algorithms are guaranteed to converge. We reconsider model averaging in federated learning and formulate it as a gradient-based method with biased gradients. This novel perspective assists analysis of its convergence rate and provides a new direction for more acceleration. We prove for the first time that the federated averaging algorithm is guaranteed to converge for non-convex problems, without imposing additional assumptions. We further propose a novel accelerated federated learning algorithm and provide a convergence guarantee. Simulated federated learning experiments are conducted to train deep neural networks on benchmark datasets, and experimental results show that our proposed method converges faster than previous approaches.
Motivation & Objective
- Motivate federated learning for on-device training with privacy-preserving distributed data.
- Provide a convergence analysis for FedAvg on non-convex problems without restrictive data-distribution assumptions.
- Propose and analyze an accelerated federated optimization method (FedMom) using momentum on the server.
- Demonstrate faster convergence of the proposed method through simulated experiments with neural networks.
Proposed method
- Reformulate FedAvg’s model averaging as a gradient-based update with biased gradients.
- Prove convergence of FedAvg for non-convex problems under standard assumptions (bounded variance and Lipschitz gradient).
- Derive a convergence-guaranteed accelerated federated momentum algorithm (FedMom) using Nesterov-style momentum on the server.
- Define and analyze the FedMom update where v_{t+1} = w_t - eta * sum_{k in S_t} (n_k/n) (w_t - w_{t+1}^k) and w_{t+1} = v_{t+1} + beta (v_{t+1} - v_t}).
- Provide theoretical bounds on the gradient norm and specify conditions on learning rates and momentum for convergence.
Experimental results
Research questions
- RQ1Does FedAvg converge for non-convex objective functions without restrictive data-distribution assumptions?
- RQ2Can the Federated Momentum (FedMom) method accelerate convergence in federated optimization while preserving convergence guarantees for non-convex problems?
- RQ3What are the necessary conditions on stepsizes, local update count, and momentum to ensure convergence to critical points in federated settings?
- RQ4How does bias in the federated gradient impact convergence, and can acceleration mitigate it?
Key findings
- FedAvg is guaranteed to converge to critical points for non-convex problems under bounded variance and Lipschitz gradient assumptions.
- A new accelerated federated learning algorithm (FedMom) with momentum on the server side is proposed and shown to converge to critical points for non-convex problems.
- Theoretical bounds show convergence rates to critical points under specified stepsize and momentum parameters, with guidance on parameter choices.
- Simulated federated training of deep neural networks demonstrates that the proposed method converges faster than previous approaches under the same settings.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.