[论文解读] Salvaging Federated Learning by Local Adaptation
本文显示,差分隐私和鲁棒聚合会降低联邦学习中每个用户的准确性,但本地自适应技术(微调、多任务学习和知识蒸馏)可以在不改变全局FL框架的情况下恢复甚至提升单个参与者的准确性。
Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the fedeated model, latest FL approaches use differential privacy or robust aggregation. We look at FL from the \emph{local} viewpoint of an individual participant and ask: (1) do participants have an incentive to participate in FL? (2) how can participants \emph{individually} improve the quality of their local models, without re-designing the FL framework and/or involving other participants? First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model extemdash and who have no incentive to participate in FL today extemdash improve less, but sufficiently to make the adapted federated model better than their local models.
研究动机与目标
- 评估在非IID数据、隐私和鲁棒性保护下,个体参与者是否受益于标准的联邦学习。
- 评估本地自适应技术在不改变FL聚合框架的情况下,能在多大程度上提升参与者的模型。
- 识别在不同参与者数据特征与隐私制度下,哪些自适应方法效果最佳。
提出的方法
- 在下一个单词预测(Reddit)和CIFAR-10 图像分类(非IID Dirichlet分布)上评估BASIC-FED、DP-FED和ROBUST-FED。
- 测试三种本地自适应方法:对所有参数进行微调(FT);冻结基础参数(FB)变体;带弹性权重凝聚的多任务学习(MTL);以及从联邦教师到学生的知识蒸馏(KD)。
- 将适应后的模型与每个参与者本地训练的模型以及在每个参与者数据上的未适应联邦模型进行比较。
- 使用带有神经网络结构的标准NLP和计算机视觉任务(词预测使用2层LSTM,隐藏单元为200;CIFAR-10使用ResNet-18)。
- 报告每个参与者的准确性和总体趋势,以理解参与激励的变化。
实验结果
研究问题
- RQ1具备隐私或鲁棒性保护的联邦模型在参与者自身数据上的表现是否优于其本地模型?
- RQ2参与者是否可以在不改变FL聚合的情况下对联邦模型进行本地自适应以提高准确性?
- RQ3在不同参与者数据分布和隐私设置下,哪些本地自适应技术最能恢复或提升准确性?
- RQ4数据特征(词汇量、总词数)如何影响本地自适应的效果?
主要发现
- 隐私和鲁棒性保护在联邦学习中降低了许多用户的单个参与者准确性。
- 自适应技术通常能恢复并在很多情况下提升个体的联邦模型相对于本地模型的准确性。
- 在词预测任务中,来自自适应的平均准确性提升分别为:2.32%(BASIC-FED)、2.12%(DP-FED)和2.12%(ROBUST-FED)。
- 在图像分类任务中,来自自适应的平均准确性提升分别为:2.98%(BASIC-FED)、6.83%(DP-FED)和6.34%(ROBUST-FED)。
- 适应后的模型在大多数参与者上都优于本地模型,最大的提升出现在初始本地模型较差的参与者身上。
- 自适应也能提升本地模型较好的参与者的联邦模型,使得适应后的联邦模型在很多情况下达到与本地模型相竞争甚至更好的水平。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。