Skip to main content
QUICK REVIEW

[论文解读] Federated Learning from Pre-Trained Models: A Contrastive Learning Approach

Yue Tan, Guodong Long|arXiv (Cornell University)|Sep 21, 2022
Privacy-Preserving Technologies in Data被引用 67
一句话总结

论文介绍 FedPCL 一种轻量级联邦学习框架,通过原型式对比学习将固定的预训练骨干网络融合,以实现个性化、通信高效的模型。

ABSTRACT

Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. In this work, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets.

研究动机与目标

  • 通过利用现成的预训练模型来激励减少FL中的计算和通信。
  • 在不从头训练大型全局模型的前提下实现个性化的表示学习。
  • 开发一个通过可学习投影融合多骨干表示的轻量级框架。
  • 提出一种基于原型的通信方案,以高效共享与类别相关的知识。

提出的方法

  • 使用多个固定预训练骨干作为编码器,产生拼接表示。
  • 为每个客户端引入一个投影网络,将骨干表示融合为潜在表示 z(x)。
  • 在服务器和客户端之间共享类别原型(全局和本地),以实现对比学习。
  • 采用一个原型式监督对比损失,包含两个项:全局原型损失和本地原型损失。
  • 在服务器上聚合本地原型形成全球原型,并通过原型填充处理缺失类别。

实验结果

研究问题

  • RQ1固定的预训练骨干在FL中能否有效融合以减少计算和通信?
  • RQ2基于原型的通信是否支持更好的个性化以及跨客户端的知识共享?
  • RQ3全局原型和本地原型如何贡献于对比学习目标与性能?
  • RQ4FedPCL 对非IID数据是否鲁棒,且对大量客户端/架构是否具有可扩展性?

主要发现

骨干网络方法MNISTSVHNUSPSSynthMNIST-MAvg通信参数数量
sFedAvg70.65(1.15)17.10(0.20)70.24(1.62)32.90(0.75)29.33(1.18)44.04(0.98)133,632
spFedMe71.13(3.63)13.18(1.78)69.20(0.30)36.25(3.35)25.25(2.25)43.00(2.26)133,632
sPerFedAvg52.68(7.03)16.28(1.23)53.66(6.58)29.05(3.45)24.38(2.38)35.21(4.13)133,632
sFedRep64.00(2.20)17.88(1.08)70.44(1.27)36.50(1.55)31.90(0.05)44.14(2.03)131,072
sFedProto80.40(2.75)17.03(0.38)88.47(0.91)40.90(1.10)32.85(0.75)51.93(1.18)2,560
sSolo60.40(2.25)15.60(0.20)75.28(4.48)34.65(0.05)28.48(0.53)42.88(1.50)-
sOurs82.75 (0.40)18.12 (0.42)88.82 (0.15)41.40 (0.60)33.05 (0.95)52.83 (0.21)2,560
mFedAvg71.68(2.93)18.45(0.45)72.95(0.86)37.35(1.35)33.70(2.55)46.83(1.63)395,776
mpFedMe67.45(2.70)15.43(0.38)65.66(7.20)33.55(4.60)31.80(0.20)42.78(3.01)395,776
mPerFedAvg56.03(2.73)17.03(0.63)57.55(0.27)34.90(2.80)30.98(1.53)39.30(1.59)395,776
mFedRep77.25(1.75)16.40(0.50)80.25(0.32)37.63(2.18)36.53(0.28)49.61(1.05)393,216
mFedProto83.78(0.83)17.90(0.10)91.74 (0.00)43.70(2.45)36.43(1.58)54.71(0.99)2,560
mSolo70.43(4.63)15.00(0.40)84.90(0.24)37.18(2.73)34.35(2.20)48.37(2.04)-
mOurs84.65 (0.15)19.38 (0.63)90.74(0.53)44.73 (0.37)37.25 (0.28)55.34 (0.34)2,560
  • FedPCL 在多种非IID设置和不同骨干网络下取得比若干基线更高的准确度。
  • 使用多种固定骨干通常提升性能并降低跨次实验的波动。
  • 与基于模型参数的方法相比,基于原型的通信显著降低了每轮的参数传输量。
  • 全局原型和本地原型共同使用比单独使用任一更利于本地训练。
  • 将多样化架构(如 ViT、CNN 等)与固定骨干结合,可在轻量级FL设置中提升性能。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。