QUICK REVIEW

[论文解读] Federated Unsupervised Representation Learning

Fengda Zhang, Kun Kuang|arXiv (Cornell University)|Oct 18, 2020

Privacy-Preserving Technologies in Data被引用 38

一句话总结

FedCA 通过协调表示字典和对齐机制，在分布式客户端之间学习共享的无监督表示，解决联邦学习中的非IID数据和表示错位问题。

ABSTRACT

To leverage enormous unlabeled data on distributed edge devices, we formulate a new problem in federated learning called Federated Unsupervised Representation Learning (FURL) to learn a common representation model without supervision while preserving data privacy. FURL poses two new challenges: (1) data distribution shift (Non-IID distribution) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces. (2) without the unified information among clients in FURL, the representations across clients would be misaligned. To address these challenges, we propose Federated Constrastive Averaging with dictionary and alignment (FedCA) algorithm. FedCA is composed of two key modules: (1) dictionary module to aggregate the representations of samples from each client and share with all clients for consistency of representation space and (2) alignment module to align the representation of each client on a base model trained on a public data. We adopt the contrastive loss for local model training. Through extensive experiments with three evaluation protocols in IID and Non-IID settings, we demonstrate that FedCA outperforms all baselines with significant margins.

研究动机与目标

在保护客户端隐私的同时，推动从分散的未标注数据中学习一个共同的无监督表示模型。
识别并解决两个关键的 FURL 挑战：非 IID 数据导致表示空间不一致和客户端之间的错位。
提出一个结合字典模块和对齐模块的 FedCA 框架，以稳定和对齐表示。

提出的方法

使用由服务器维护的字典模块，在客户端之间聚合并共享表示（负样本），以实现一致的对比学习。
使用一个在公开数据集上训练的基模型的对齐模块，并对本地模型进行正则化，使其模仿对齐模型的输出。
在每个客户端本地对增强数据应用对比学习，整合基于字典的负样本和对齐正则化。
采用时序集成通过维护不断演化的集合投影来稳定字典，从而获得更鲁棒的负样本。
将对比损失与对齐损失结合，由权重 beta 控制，用于训练本地编码器和投影头。
按照类似 FedAvg 的协议，迭代执行客户端本地更新和服务器聚合的全局模型更新。

实验结果

研究问题

RQ1如何在保证隐私的前提下，将联邦学习扩展到无监督表示学习？
RQ2在异质（非 IID）客户端数据之间，在无标签的情况下是否能够获得共享的表示空间？
RQ3基于字典的表示和跨客户端对齐是否相对于简单的联合联邦学习方法提升了联邦无监督学习？
RQ4时序集成和对齐正则化对跨客户端表示一致性有何影响？

主要发现

FedCA 在 IID 与 Non-IID 设置下，均优于简单将联邦学习与无监督方法相结合的基线。
字典模块通过提供共享的负样本来改善表示空间的一致性。
对齐模块降低本地模型之间的表示错位，使它们的输出更接近在公开数据上训练的基模型。
时序集成通过在多轮中稳定表示，提升字典在非 IID 设置下的有效性。
结合字典和对齐模块的 FedCA 在最具挑战的 Non-IID 条件下表现最佳。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。