QUICK REVIEW

[论文解读] Single-Round Clustered Federated Learning via Data Collaboration Analysis for Non-IID Data

Sota Sugawara, Yuji Kawamata|arXiv (Cornell University)|Jan 14, 2026

Privacy-Preserving Technologies in Data被引用 0

一句话总结

本文提出 DC-CFL，这是一个单轮聚类联邦学习框架，利用数据协作分析对客户端进行聚类并在非IID数据条件下训练簇级模型，仅需要一次通信轮次。

ABSTRACT

Federated Learning (FL) enables distributed learning across multiple clients without sharing raw data. When statistical heterogeneity across clients is severe, Clustered Federated Learning (CFL) can improve performance by grouping similar clients and training cluster-wise models. However, most CFL approaches rely on multiple communication rounds for cluster estimation and model updates, which limits their practicality under tight constraints on communication rounds. We propose Data Collaboration-based Clustered Federated Learning (DC-CFL), a single-round framework that completes both client clustering and cluster-wise learning, using only the information shared in DC analysis. DC-CFL quantifies inter-client similarity via total variation distance between label distributions, estimates clusters using hierarchical clustering, and performs cluster-wise learning via DC analysis. Experiments on multiple open datasets under representative non-IID conditions show that DC-CFL achieves accuracy comparable to multi-round baselines while requiring only one communication round. These results indicate that DC-CFL is a practical alternative for collaborative AI model development when multiple communication rounds are impractical.

研究动机与目标

解决联邦学习（FL）中客户端之间统计异质性的挑战。
开发一个既能对客户端进行聚类又能学习簇特定模型的单轮框架。
利用数据协作（DC）分析来量化客户端间相似性并指导聚类与学习。
证明单轮 DC-CFL 在准确性方面与多轮基线具有竞争力。
提供开源代码以促进采用与复现。

提出的方法

通过标签分布的全变差距离来量化客户端间的相似性。
基于相似性量度进行层次聚类以估计簇。
通过数据协作分析进行簇级学习。
在一个通信轮次内完成聚类与学习。
在多个开放数据集上在具代表性的非IID条件下进行评估。

实验结果

研究问题

RQ1在非IID数据下，单次通信轮次是否足以对 CFL 中的客户端进行聚类并训练簇特定模型？
RQ2标签分布的全变差距离在多大程度上能够捕捉用于聚类的客户端间相似性？
RQ3在非IID设置下，DC-CFL 的准确性是否可与多轮 CFL 基线相媲美？

主要发现

DC-CFL 在仅使用一次通信轮次的情况下达到与多轮基线相当的准确性。
基于数据协作的相似性度量可以在非IID情景中有效引导客户端聚类。
使用所提相似性度量的层次聚类成功将相似客户端分组以实现簇级学习。
在多轮通信不可行时，该方法提供了一种切实可行的替代方案。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。