QUICK REVIEW

[논문 리뷰] Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

Boyue Li, Shicong Cen|arXiv (Cornell University)|2019. 09. 12.

Stochastic Gradient Optimization Techniques인용 수 51

한 줄 요약

이 논문은 경사 추적과 분산 방식을 사용하는 분산 최적화 알고리즘(Network-DANE, Network-SVRG, Network-SARAH)을 제시하여 네트워크 시스템에서 통신 및 계산 효율적인 수렴을 달성합니다. 2차식 및 강볼록 손실에 대해 선형 수렴을 증명하고 실험을 통해 실용적인 이점을 보여줍니다.

ABSTRACT

There is growing interest in large-scale machine learning and optimization over decentralized networks, e.g. in the context of multi-agent learning and federated learning. Due to the imminent need to alleviate the communication burden, the investigation of communication-efficient distributed optimization algorithms - particularly for empirical risk minimization - has flourished in recent years. A large fraction of these algorithms have been developed for the master/slave setting, relying on a central parameter server that can communicate with all agents. This paper focuses on distributed optimization over networks, or decentralized optimization, where each agent is only allowed to aggregate information from its neighbors. By properly adjusting the global gradient estimate via local averaging in conjunction with proper correction, we develop a communication-efficient approximate Newton-type method Network-DANE, which generalizes DANE to the decentralized scenarios. Our key ideas can be applied in a systematic manner to obtain decentralized versions of other master/slave distributed algorithms. A notable development is Network-SVRG/SARAH, which employs variance reduction to further accelerate local computation. We establish linear convergence of Network-DANE and Network-SVRG for strongly convex losses, and Network-SARAH for quadratic losses, which shed light on the impacts of data homogeneity, network connectivity, and local averaging upon the rate of convergence. We further extend Network-DANE to composite optimization by allowing a nonsmooth penalty term. Numerical evidence is provided to demonstrate the appealing performance of our algorithms over competitive baselines, in terms of both communication and computation efficiency. Our work suggests that performing a certain amount of local communications and computations per iteration can substantially improve the overall efficiency.

연구 동기 및 목표

중앙 서버 없이 네트워크에서 효율적인 경험적 위험 최소화를 촉진한다.
네트워크 설정에 적합한 DANE의 분산 버전과 분산 감소 방법을 개발한다.
데이터 동질성(beta)과 네트워크 연결성(alpha)이 수렴 속도에 미치는 영향을 정량화하는 수렴 보장을 제공한다.
네트워크를 합성(비평활) 최적화로 확장하고 성능을 실험적으로 검증한다.

제안 방법

경사 추적과 함께 분산 설정에 DANE를 적응시켜 Network-DANE를 제시한다.
중앙 조정자가 없이 각 에이전트에서 글로벌 기울기를 추적하기 위해 동적 평균 합의(dynamic average consensus)를 사용한다.
네트워크 혼합을 개선하고 수렴 속도를 가속화하기 위해 다중 로컬 평균화 라운드(K)를 도입한다.
로컬 하위 문제에서 글로벌 기울기를 그래프 합의 기반 대체 기울기로 바꾼다.
네트워크 설정에 분산 방지(variance reduction)를 도입하기 위해 Network-SVRG와 Network-SARAH를 개발한다.
프로ximal(비평활) 합성 최적화로 Network-DANE를 확장하고 수렴을 분석한다.

실험 결과

연구 질문

RQ1그래디언트 트래킹과 로컬 평균화를 결합하면 수렴 보장을 갖춘 통신 효율적인 분산 최적화를 이끌 수 있는가?
RQ2데이터 동질성(beta)과 네트워크 연결성(alpha)이 Network-DANE, Network-SVRG, Network-SARAH의 수렴 속도에 어떤 영향을 미치는가?
RQ3이들 분산 알고리즘에서 로컬 계산, 통신 라운드, 그리고 수렴 속도 간의 트레이드오프는 무엇인가?
RQ4분산 방지 기법이 네트워크 기반 근사 Newton형 방법에서 선형 수렴을 유지하는가?

주요 결과

Network-DANE는 적절한 설정에서 2차 손실에 대해 선형 수렴을 달성하며 데이터가 더 균일하고 네트워크가 더 잘 연결될수록 속도가 개선된다.
Network-SVRG와 Network-SARAH는 추가 평균화를 통해 강볼록(및 2차) 손실에서 선형 수렴을 달성하며 로컬 계산을 줄인다.
그래디언트 트래킹을 사용하면 데이터 및 토폴로지 조건이 유리할 때 중앙 서버 기준선과 비교해 통신 효율성에서 대등하거나 우수하게 된다.
추가 로컬 평균화(다중 혼합 라운드)는 효과적인 네트워크 혼합 속도를 개선하여 전체 통신 라운드를 대폭 줄일 수 있다.
근사(프로ximal) 확장은 같은 네트워크 효율적 프레임워크 내에서 비평활 합성 최적화를 가능하게 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.