[论文解读] Personalized Federated Learning: A Meta-Learning Approach
本文提出 Per-FedAvg,一种受 MAML 启发的个性化联邦学习方法,用于学习一个初始化,以便快速适应每个用户的本地数据,并在非凸损失下提供理论收敛保证。
In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common output for all the users, and, therefore, it does not adapt the model to each user. This is an important missing feature, especially given the heterogeneity of the underlying data distribution for various users. In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data. This approach keeps all the benefits of the federated learning architecture, and, by structure, leads to a more personalized model for each user. We show this problem can be studied within the Model-Agnostic Meta-Learning (MAML) framework. Inspired by this connection, we study a personalized variant of the well-known Federated Averaging algorithm and evaluate its performance in terms of gradient norm for non-convex loss functions. Further, we characterize how this performance is affected by the closeness of underlying distributions of user data, measured in terms of distribution distances such as Total Variation and 1-Wasserstein metric.
研究动机与目标
- 通过实现用户特定适应来解决联邦学习中的数据异质性。
- 将模型无关元学习(MAML)的思想适配到联邦平均以获得个性化模型。
- 开发 Per-FedAvg 并分析其在非凸损失下的收敛性。
- 通过 TV 距离和 Wasserstein 度量来表征用户之间分布差异如何影响性能。
提出的方法
- 将个性化 FL 表述为最小化 F(w)= (1/n) sum_i f_i(w - α ∇f_i(w)),灵感来自 MAML。
- 引入 Per-FedAvg,一种类似 FedAvg 的算法,其中本地更新优化 F_i(w) = f_i(w − α ∇ f_i(w))。
- 使用梯度和 Hessian 的无偏随机估计来执行本地更新。
- 分析光滑性以及梯度/ Hessian 估计的偏差与方差以建立收敛性。
- 就参数选择(τ、K、β)以及数据相似性(γ_G、γ_H)和分布距离如何影响性能提供指导。
实验结果
研究问题
- RQ1在联邦学习中是否可以学习一个共享初始化,使得经过少量本地梯度步就能为异质用户实现强个性化?
- RQ2由分布距离(TV、Wasserstein)衡量的数据异质性如何影响个性化 FL 算法的收敛性和性能?
- RQ3在非凸目标和随机梯度下,Per-FedAvg 的收敛性保证是什么?
- RQ4应如何选择元步长 α、本地更新 τ、以及通信轮次 K,以实现 ε-近似的一阶驻点?
主要发现
| 数据集 | 参数 | 算法 | FedAvg + 更新 | Per-FedAvg (FO) | Per-FedAvg (HF) |
|---|---|---|---|---|---|
| MNIST | τ=10,α=0.01 | FedAvg + update | 75.96% ± 0.02% | 78.00% ± 0.02% | 79.85% ± 0.02% |
| MNIST | τ=4,α=0.01 | FedAvg + update | 60.18 % ± 0.02% | 64.55% ± 0.02% | 70.94% ± 0.03% |
| CIFAR-10 | τ=10,α=0.001 | FedAvg + update | 40.49% ± 0.07% | 46.98% ± 0.10% | 50.44% ± 0.15% |
| CIFAR-10 | τ=4,α=0.001 | FedAvg + update | 38.38% ± 0.07% | 34.04% ± 0.08% | 43.73% ± 0.11% |
| CIFAR-10 | τ=4,α=0.01 | FedAvg + update | 35.97% ± 0.17% | 25.32% ± 0.18% | 46.32% ± 0.12% |
| CIFAR-10 | τ=4,α=0.01, | FedAvg + update | 58.59% ± 0.11% | 37.71% ± 0.23% | 71.25% ± 0.05% |
- Per-FedAvg 在异质性场景下优于标准 FedAvg,尤其是在考虑 Hessian 的更新时。
- 收敛性分析显示并量化了异质性与分布接近度(γ_G、γ_H)如何在非凸目标下影响收敛速率。
- 在合适的参数下,Per-FedAvg 在通信轮次 K = O(ε^(-3/2)) 和本地更新 τ = O(ε^(-1/2)) 时达到 ε-近似的一阶驻点。
- HF-MAML 变体(二阶感知)在异质数据场景下通常比 FO-MAML(一阶)具有更好的性能。
- 在 MNIST 和 CIFAR-10 上的数值实验表明 Per-FedAvg(HF)始终优于 FedAvg,在更具多样性的数据集上收益更大。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。