QUICK REVIEW

[논문 리뷰] On the Convergence of Local Descent Methods in Federated Learning

Farzin Haddadpour, Mehrdad Mahdavi|arXiv (Cornell University)|2019. 10. 31.

Stochastic Gradient Optimization Techniques참고 문헌 38인용 수 169

한 줄 요약

본 논문은 이질적인 데이터 하에서 연합학습에서 주기적 평균화를 동반한 로컬 GD/SGD의 수렴을 분석하고, 수렴 속도를 증명하며 그래디언트 다양성의 경계가 분산 감소 및 선형 스피드업을 가능하게 하는 방식을 확인합니다. 또한 비-convex 및 PL 조건을 포함한 중앙집중형 및 네트워크 설정을 다룹니다.

ABSTRACT

In federated distributed learning, the goal is to optimize a global training objective defined over distributed devices, where the data shard at each device is sampled from a possibly different distribution (a.k.a., heterogeneous or non i.i.d. data samples). In this paper, we generalize the local stochastic and full gradient descent with periodic averaging-- originally designed for homogeneous distributed optimization, to solve nonconvex optimization problems in federated learning. Although scant research is available on the effectiveness of local SGD in reducing the number of communication rounds in homogeneous setting, its convergence and communication complexity in heterogeneous setting is mostly demonstrated empirically and lacks through theoretical understating. To bridge this gap, we demonstrate that by properly analyzing the effect of unbiased gradients and sampling schema in federated setting, under mild assumptions, the implicit variance reduction feature of local distributed methods generalize to heterogeneous data shards and exhibits the best known convergence rates of homogeneous setting both in general nonconvex and under {\pl}~ condition (generalization of strong-convexity). Our theoretical results complement the recent empirical studies that demonstrate the applicability of local GD/SGD to federated learning. We also specialize the proposed local method for networked distributed optimization. To the best of our knowledge, the obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.

연구 동기 및 목표

이질적인 데이터 분포를 가진 커뮤니케이션 효율적 연합 최적화의 연구 동기를 제시한다.
주기적 평균화를 가진 로컬 GD/SGD를 연합 설정의 비convex 목적에 일반화한다.
유한한 그래디언트 다양성과 PL 조건 하에서 수렴 속도를 확립한다.
중앙 집중형, 분산형(네트워크화), 샘플링된 기기 구성을 특수화한다.

제안 방법

주기적 평균화를 갖는 Local Federated Descent (LFD)를 E(로컬 업데이트), K(샘플링된 기기), q(기기 가중치)로 매개변수화하여 제안한다.
LFD를 Local Federated GD (LFGD) 및 Local Federated SGD (LFSGD)로 특수화하되 전체 그래디언트 및 확률적 그래디언트 설정을 함께 다룬다.
이질성을 정량화하기 위해 Weighted Gradient Diversity Λ(w,q)을 도입하고 수렴을 위한 학습률과 E의 조건을 도출한다.
비convex 목적 및 PL 조건 하의 비convex 목적에 대한 수렴 보장을 도출한다.
직접 이웃과 통신하는 네트워크 distributed 최적화 및 네트워크된 구성에서의 분석 확장을 다룬다.

실험 결과

연구 질문

RQ1이질적인 로컬 데이터 샤드가 주기적 평균화를 통한 로컬 하강의 수렴에 어떻게 영향을 미치는가?
RQ2학습률, 로컬 업데이트 횟수, 샘플링 조건 하에서 로컬 GD/SGD가 비convex FL 설정에서 수렴을 달성할 수 있는 조건은 무엇인가?
RQ3유한한 그래디언트 다양성 하에서 비convex 및 PL-조건 목적의 수렴 속도는 어느 정도인가?
RQ4네트워크 기반 이웃 분산 최적화 및 샘플링된 기기 설정으로 결과를 확장할 수 있는가?

주요 결과

주기적 평균화를 갖는 로컬 하강은 그래디언트 다양성이 제한된 범위에서 수렴을 달성하며, 다양한 모드에서 기존 연구를 매칭하거나 개선하는 속도를 보인다.
PL 하에서 비convex 목적에 대해, 일부 prior bounds와 비교해 O(1/(KT)) 의 의존성과 같은 개선된 속도를 보인다.
수렴 속도는 중앙 집중형(파라미터 서버)과 분산형 네트워크 FL 모두 및 전체 그래디언트와 확률적 그래디언트 설정에서 성립한다.
그래디언트 다양성에 의존하는 학습률 및 로컬 업데이트 선택은 선형 스피드업을 가능하게 하며 다양성이 관리될 때 이를 가능하게 한다.
적절한 하이퍼파라미터 조정을 통해 분산된 분산 감소(vervariance-reduction)와 유사한 거동을 보이면서도 명시적인 분산 감소 기법 없이도 분석이 일치한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.