QUICK REVIEW

[论文解读] Distilled One-Shot Federated Learning

Yanlin Zhou, George Pu|arXiv (Cornell University)|Sep 17, 2020

Privacy-Preserving Technologies in Data参考文献 42被引用 73

一句话总结

DOSFL 将每个客户端的私有数据在单轮中蒸馏成合成样本，仅将蒸馏数据发送给服务器以训练全局模型，在通信量减少高达 1000x 的同时，实现接近集中式性能的 93%–99%。

ABSTRACT

Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance. In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model. The distilled data look like noise and are only useful to the specific model weights, i.e., become useless after the model updates. With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg while preserving between 93% to 99% performance of a centralized counterpart. Afterwards, clients could switch to traditional methods such as FedAvg to finetune the last few percent to fit personalized local models with local datasets. Through comprehensive experiments, we show the accuracy and communication performance of DOSFL on both vision and language tasks with different models including CNN, LSTM, Transformer, etc. We demonstrate that an eavesdropping attacker cannot properly train a good model using the leaked distilled data, without knowing the initial model weights. DOSFL serves as an inexpensive method to quickly converge on a performant pre-trained model with less than 0.1% communication cost of traditional methods.

研究动机与目标

在不牺牲准确性的前提下，减少联邦学习中的通信轮次和传输数据。
利用数据集蒸馏在一轮内生成用于训练全局模型的合成数据。
通过软标签、软重置和随机掩蔽来应对非IID数据的挑战。
展示 DOSFL 在视觉和语言任务以及不同模型类型上的适用性。

提出的方法

服务器初始化全局模型 θ0 并将其广播给客户端。
每个客户端将其私有数据蒸馏为包含标签和学习率的小型合成数据集。
服务器将所有客户端的蒸馏数据合并，并通过对蒸馏序列进行多步梯度更新来更新全局模型。
使用软标签来提高对非IID数据的鲁棒性。
引入两种技术——软重置和随机掩蔽——以减轻非IID蒸馏数据的干扰。
最终模型分发回客户端以进行可选的微调。

实验结果

研究问题

RQ1通过传输蒸馏数据而不是模型权重或梯度，是否可以实现数量级以上的通信量降低的联邦学习？
RQ2软标签、软重置和随机掩蔽在视觉和语言任务上的 IID 与非 IID 数据上的性能影响如何？
RQ3在不同模型类型上，一轮蒸馏数据训练在多大程度上可以逼近集中式训练？
RQ4蒸馏数据方法在窃听和初始权重不确定性下是否具有鲁棒性？

主要发现

在所报道的设置中，DOSFL 相较于 FedAvg 实现了高达约 1000x 的通信量减少。
在 IID 数据上，DOSFL 在各任务上保持了 93%–99% 的集中式训练性能。
在非 IID 数据上，软重置在所提出技术中带来最大的提升，显著提高鲁棒性。
DOSFL 支持多种模型类型（CNN、LSTM、Transformer）和任务（视觉与语言），与集中式基线的准确度相当。
如果没有初始服务器权重的知识，窃听者无法仅凭泄露的蒸馏数据重现全局模型，表明相较于传统 FL 具有隐私优势。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。