QUICK REVIEW

[论文解读] Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

Boyue Li, Shicong Cen|arXiv (Cornell University)|Sep 12, 2019

Stochastic Gradient Optimization Techniques被引用 51

一句话总结

本文开发了去中心化优化算法（Network-DANE、Network-SVRG 和 Network-SARAH），它们利用梯度跟踪和方差缩减，在网络化系统中实现通信与计算高效的收敛。它证明二次/强凸损失的线性收敛，并通过实验展示实际收益。

ABSTRACT

There is growing interest in large-scale machine learning and optimization over decentralized networks, e.g. in the context of multi-agent learning and federated learning. Due to the imminent need to alleviate the communication burden, the investigation of communication-efficient distributed optimization algorithms - particularly for empirical risk minimization - has flourished in recent years. A large fraction of these algorithms have been developed for the master/slave setting, relying on a central parameter server that can communicate with all agents. This paper focuses on distributed optimization over networks, or decentralized optimization, where each agent is only allowed to aggregate information from its neighbors. By properly adjusting the global gradient estimate via local averaging in conjunction with proper correction, we develop a communication-efficient approximate Newton-type method Network-DANE, which generalizes DANE to the decentralized scenarios. Our key ideas can be applied in a systematic manner to obtain decentralized versions of other master/slave distributed algorithms. A notable development is Network-SVRG/SARAH, which employs variance reduction to further accelerate local computation. We establish linear convergence of Network-DANE and Network-SVRG for strongly convex losses, and Network-SARAH for quadratic losses, which shed light on the impacts of data homogeneity, network connectivity, and local averaging upon the rate of convergence. We further extend Network-DANE to composite optimization by allowing a nonsmooth penalty term. Numerical evidence is provided to demonstrate the appealing performance of our algorithms over competitive baselines, in terms of both communication and computation efficiency. Our work suggests that performing a certain amount of local communications and computations per iteration can substantially improve the overall efficiency.

研究动机与目标

在没有中心服务器的网络上推动高效的经验风险最小化。
开发适合网络环境的 DANE 去中心化变体和方差缩减方法。
提供收敛性保证，量化数据同质性和网络连通性对收敛速率的影响。
将网络扩展到复合（非光滑）优化并进行实证性能验证。

提出的方法

通过将 DANE 适配到带梯度跟踪的去中心化设置，引入 Network-DANE。
利用动态平均一致性在每个代理处跟踪全局梯度，无需中央协调者。
引入多轮本地平均（K）以改善网络混合并加速收敛。
在本地子问题中用基于图一致性的代理梯度替代全局梯度。
开发 Network-SVRG 和 Network-SARAH，将方差缩减带入网络环境。
将 Network-DANE 扩展到近端（非光滑）复合优化并分析收敛性。

实验结果

研究问题

RQ1梯度跟踪结合本地平均是否能在提供收敛性保证的同时实现高效通信的去中心化优化？
RQ2数据同质性（beta）和网络连通性（alpha）如何影响 Network-DANE、Network-SVRG 和 Network-SARAH 的收敛速率？
RQ3在这些去中心化算法中，局部计算、通信轮次与收敛速度之间的权衡是什么？
RQ4方差缩减技巧是否能在网络化的近似牛顿型方法中保持线性收敛？

主要发现

在适当设定下，Network-DANE 对二次损失实现线性收敛，当数据更同质且网络连接更好时，收敛速度提高。
Network-SVRG 和 Network-SARAH 在额外平均下，对强凸（及二次）损失实现线性收敛，降低局部计算。
在有利的数据和拓扑条件下，使用梯度跟踪使去中心化优化在通信效率上达到甚至超过中心服务器基线。
额外的本地平均（多轮混合）通过提高有效网络混合速度，显著减少总体通信轮次。
近端扩展在同一网络高效框架内实现非光滑复合优化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。