Skip to main content
QUICK REVIEW

[论文解读] Revisiting Weighted Aggregation in Federated Learning with Neural Networks

Zexi Li, Tao Lin|arXiv (Cornell University)|Feb 14, 2023
Privacy-Preserving Technologies in Data被引用 19
一句话总结

本文重新研究联邦学习中的加权聚合,揭示一种全局权重收缩效应可以提升泛化,分析客户端一致性,并提出 FedLAW(Fed erated Learning with Learnable Aggregation Weights)以增强全局模型泛化。

ABSTRACT

In federated learning (FL), weighted aggregation of local models is conducted to generate a global model, and the aggregation weights are normalized (the sum of weights is 1) and proportional to the local data sizes. In this paper, we revisit the weighted aggregation process and gain new insights into the training dynamics of FL. First, we find that the sum of weights can be smaller than 1, causing global weight shrinking effect (analogous to weight decay) and improving generalization. We explore how the optimal shrinking factor is affected by clients' data heterogeneity and local epochs. Second, we dive into the relative aggregation weights among clients to depict the clients' importance. We develop client coherence to study the learning dynamics and find a critical point that exists. Before entering the critical point, more coherent clients play more essential roles in generalization. Based on the above insights, we propose an effective method for Federated Learning with Learnable Aggregation Weights, named as FedLAW. Extensive experiments verify that our method can improve the generalization of the global model by a large margin on different datasets and models.

研究动机与目标

  • Investigate how non-standard (non-1) l1 aggregation norms (gamma) affect FL training dynamics and generalization.
  • Study the relative aggregation weights among clients (lambda) and define client coherence.
  • Develop a method to learn aggregation weights on a proxy dataset to optimize global objective.
  • Propose and evaluate FedLAW (Federated Learning with Learnable Aggregation Weights) to boost generalization across datasets/models.

提出的方法

  • Decompose aggregation weights into gamma (l1 norm of weights) and lambda (relative weights) to study global shrinking and client importance.
  • Learn gamma by gradient descent on a proxy dataset to observe its effect as a regularization factor (global weight shrinking).
  • Learn lambda on a proxy dataset to capture client coherence and determine which clients contribute more before a critical point.
  • Define and measure local gradient coherence and heterogeneity coherence to understand training dynamics.
  • Introduce attentive LAW to learn optimal lambda (and gamma fixed or tuned) on a proxy dataset, guiding client participation weights.
  • Propose FedLAW algorithm that updates local clients, then optimizes aggregation weights on a proxy dataset, and applies the learned weights in aggregation.

实验结果

研究问题

  • RQ1What is the impact of using gamma < 1 on FL generalization and training dynamics?
  • RQ2How do relative client weights lambda influence training dynamics and generalization via client coherence?
  • RQ3Can learnable aggregation weights on a proxy dataset improve global model performance across IID and Non-IID settings?
  • RQ4How do global weight shrinking and client coherence interact with local epochs and data heterogeneity?
  • RQ5Is FedLAW robust to small or shifted proxy datasets and corrupted clients?

主要发现

  • Global weight shrinking (gamma < 1) can improve generalization, with an optimal gamma balancing regularization and optimization.
  • The norm of the global gradient governs the optimal shrinking factor; larger global gradients require stronger regularization.
  • A critical point exists in local gradient coherence; before this point, more coherent clients contribute more to generalization, especially balanced data clients.
  • Attentive LAW learns aggregation weights that favor more coherent or balanced clients in early rounds, improving early generalization and heterogeneity coherence.
  • Adaptive global weight shrinking (adaptive GWS) maintains positive local gradient coherence after the critical point, yielding further gains over FedAvg.
  • FedLAW substantially improves generalization across CIFAR-10/100 and FashionMNIST with various models, and shows robustness to proxy dataset shifts and corrupted clients.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。