QUICK REVIEW

[论文解读] Visual Prompt Based Personalized Federated Learning

Guanghao Li, Wansen Wu|arXiv (Cornell University)|Mar 15, 2023

Privacy-Preserving Technologies in Data被引用 8

一句话总结

该论文介绍了 pFedPT，一种基于视觉提示的个性化联邦学习框架，用于图像分类，利用客户端特定提示隐式编码本地数据分布并引导共享骨干网络，在 CIFAR-10/100 上相对于最先进的 PFL 方法实现个性化和性能提升。

ABSTRACT

As a popular paradigm of distributed learning, personalized federated learning (PFL) allows personalized models to improve generalization ability and robustness by utilizing knowledge from all distributed clients. Most existing PFL algorithms tackle personalization in a model-centric way, such as personalized layer partition, model regularization, and model interpolation, which all fail to take into account the data characteristics of distributed clients. In this paper, we propose a novel PFL framework for image classification tasks, dubbed pFedPT, that leverages personalized visual prompts to implicitly represent local data distribution information of clients and provides that information to the aggregation model to help with classification tasks. Specifically, in each round of pFedPT training, each client generates a local personalized prompt related to local data distribution. Then, the local model is trained on the input composed of raw data and a visual prompt to learn the distribution information contained in the prompt. During model testing, the aggregated model obtains prior knowledge of the data distributions based on the prompts, which can be seen as an adaptive fine-tuning of the aggregation model to improve model performances on different clients. Furthermore, the visual prompt can be added as an orthogonal method to implement personalization on the client for existing FL methods to boost their performance. Experiments on the CIFAR10 and CIFAR100 datasets show that pFedPT outperforms several state-of-the-art (SOTA) PFL algorithms by a large margin in various settings.

研究动机与目标

在联邦学习中超越仅关注模型的做法，动机在于需要数据分布感知的个性化。
提出一种新框架，使用客户端特定的视觉提示来编码本地数据分布信息。
实现提示生成器与共享骨干网络的交替训练，以实现对客户端的特定微调。
证明提示可以作为插件提升其他 FL/PFL 方法的性能，并在标准基准上提升表现。

提出的方法

每个客户端维护一个本地提示生成器和一个骨干网络。
为每个客户端生成一个个性化的视觉提示，并在训练过程中将其添加到本地输入中。
交替优化：在骨干网络冻结时更新提示生成器，在提示冻结时更新骨干网络。
服务器在每次通信轮次对各客户端的骨干网络进行聚合，采用联邦平均。
提示的大小/类型会变化；在 CIFAR-10 实验中，基于填充的大小为 4 的提示性能最佳。
目标函数在骨干网络参数和客户端特定提示上同时最小化损失：L(w, δ_i) = E_{(x,y)~D_i}[ℓ_i(w; (x+δ_i, y))]。

Figure 1: Differences in local update and aggregation phases between FedAvg and pFedPT. In the figure, the lines represent the decision boundaries defined by the backbone. Assume that each client has two classes represented by different shapes. (a) In FedAvg, due to the heterogeneity of data in each

实验结果

研究问题

RQ1客户端特定的视觉提示能否编码本地数据分布以引导共享骨干网络实现更好的个性化性能？
RQ2在标准图像分类基准的非 IID 设置下，pFedPT 相对于现有的 PFL 基线的表现如何？
RQ3提示是否能为其他 FL/PFL 方法提供插件式改进？
RQ4在实际实践中，哪些提示设计（位置与大小）能带来最佳性能？

主要发现

pFedPT 在 CIFAR-10/100 的多种非 IID 设置下，测试准确率始终领先于基线。
在 Dirichlet CIFAR-10 与 CNN 设置中，pFedPT 达到 80.83% ，而 FedAvg 为 61.92%、FedPer 为 77.98%，显示显著提升。
pFedPT 在 ViT 和 CNN 骨干上均表现鲁棒，且随着数据异质性增加收益似乎更大。
带提示的增强提示可通过为骨干网络提供分布感知的提示，提升其它 FL 方法（如 FedProx、MOON、FedRep）的效果。
在 CIFAR-10 的消融研究中，大小为 4 的填充提示表现最佳，其他提示设计略逊于其。
视觉分析（Grad-CAM、t-SNE）表明提示会引导注意力和嵌入向量偏向编码客户端特定信息，帮助分类。

Figure 2: The pipeline of the pFedPT. $\hat{y}$ stands for the predicted logits of all classes. The dashed lines in steps 1 and 2 represent the loss backward for the model update. Each client contains a Prompt Generator, a set of personalized learnable parameters preserved locally, and a Backbone, w

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。