[论文解读] Federated Graph Classification over Non-IID Graphs
本文提出 GCFL,一种基于梯度的聚类联邦学习框架,用于非IID图数据的图分类,GCFL+ 使用梯度序列 DTW 聚类以更好地处理异质性;实验在多个数据集和领域中显示相对于基线的一致性改进。
Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustered federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks.
研究动机与目标
- 证明现实世界图具有可支持跨数据集的联邦学习用于图分类的属性。
- 量化跨图数据集与领域的结构与特征异质性。
- 开发 GCFL,通过梯度相似性动态聚类客户端,并训练簇内的 GNN。
- 通过 GCFL+ 使用梯度序列和动态时间规整来改进聚类。
- 在单数据集和多数据集设置中,展示相对于 FedAvg 与 FedProx 的经验收益。
提出的方法
- 以 Graph Isomorphism Network (GIN) 为核心模型进行图分类。
- 通过分析传输的梯度动态聚类客户端,形成同质组。
- 在每个簇内通过 FedAvg 训练簇特定的 GNN 模型。
- GCFL 引入基于梯度范数的两种停止/分区准则来触发聚类。
- GCFL+ 维护一个梯度范数时间序列矩阵,并使用动态时间规整在多轮中对聚类进行细化。
- 提供理论依据,表明 GNN 梯度反映图结构和特征差异。
实验结果
研究问题
- RQ1基于梯度的聚类是否能降低非IID图联邦学习中的结构与特征异质性?
- RQ2在图分类任务中,GCFL 训练的簇内 GNN 是否优于原生 FedAvg 和 FedProx?
- RQ3通过 DTW(GCFL+)纳入梯度序列信息,是否比仅使用最后梯度得到更稳定更优的聚类?
- RQ4跨数据集/跨领域的图联邦学习对异质来源的图分类是否有帮助?
主要发现
- GCFL 与 GCFL+ 在多个数据集上相对于自训练和标准 FL 基线(FedAvg、FedProx)提高了图分类准确性。
- 在单数据集设置中,GCFL/GCFL+ 在某些数据集上比自训练平均提升约 14.75 个百分点。
- 在多个数据集和域上,GCFL/GCFL+ 持续提高大多数客户端的表现,GCFL+ 通常优于 GCFL。
- GCFL+ 使用梯度序列 DTW 聚类以更好地捕捉长期训练动态,提升聚类质量与性能。
- 理论结果表明,GNN 梯度将权重变化限定在结构/特征差异范围内,支持基于梯度的聚类有效性。
- GCFL+ 通过基于序列的聚类防止收益较差的客户端拖累簇,维持鲁棒性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。