Skip to main content
QUICK REVIEW

[论文解读] FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks

Chaoyang He, Keshav Balasubramanian|arXiv (Cornell University)|Apr 14, 2021
Advanced Graph Neural Networks参考文献 102被引用 103
一句话总结

FedGraphNN 引入一个开放的联邦学习基准系统,用于图神经网络,覆盖多样的数据集、GNN 模型和 FL 算法,并为跨机构的联邦图学习提供一个高效、稳定且模块化的系统。

ABSTRACT

Graph Neural Network (GNN) research is rapidly growing thanks to the capacity of GNNs in learning distributed representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to privacy concerns, regulation restrictions, and commercial competitions. Federated learning (FL), a trending distributed learning paradigm, provides possibilities to solve this challenge while preserving data privacy. Despite recent advances in vision and language domains, there is no suitable platform for the FL of GNNs. To this end, we introduce FedGraphNN, an open FL benchmark system that can facilitate research on federated GNNs. FedGraphNN is built on a unified formulation of graph FL and contains a wide range of datasets from different domains, popular GNN models, and FL algorithms, with secure and efficient system support. Particularly for the datasets, we collect, preprocess, and partition 36 datasets from 7 domains, including both publicly available ones and specifically obtained ones such as hERG and Tencent. Our empirical analysis showcases the utility of our benchmark system, while exposing significant challenges in graph FL: federated GNNs perform worse in most datasets with a non-IID split than centralized GNNs; the GNN model that attains the best result in the centralized setting may not maintain its advantage in the FL setting. These results imply that more research efforts are needed to unravel the mystery behind federated GNNs. Moreover, our system performance analysis demonstrates that the FedGraphNN system is computationally efficient and secure to large-scale graphs datasets. We maintain the source code at https://github.com/FedML-AI/FedGraphNN.

研究动机与目标

  • 提供一个统一的联邦图学习框架(graph FL)和多样的任务设置(图级、子图级、节点级)。
  • 汇集并预处理来自7个领域的36个图数据集,以模拟真实的非IID联邦场景。
  • 提供一个高效、安全、模块化的 FedGraphNN 基准系统,便于实现可重复的实验。
  • 评估联邦GNN相对于集中基线的性能,并揭示图FL中的关键挑战。

提出的方法

  • 将 FedGraphNN 表述为在 K 个客户端上的分布式优化问题,局部目标按数据比例加权(F(W)=sum_k (N^(k)/N) f^(k)(W))。
  • 采用一个归纳型 GNN 框架(MPNN),包含两个阶段:信息传递(message-passing)和读出(readout),支持多种 GNN 结构(GCN、GAT、GraphSAGE、SGC、GIN)。
  • 支持 FL 算法(FedAvg、FedOPT 等)以及安全聚合(LightSecAgg)以实现隐私保护的联合训练。
  • 将图FL划分为图级、子图级和节点级设置,并给出相应的典型任务(图分类、链路预测、节点分类)。
  • 提供模块化的 API 和数据加载器,便于跨机构环境中的实验、基准测试与部署。

实验结果

研究问题

  • RQ1联邦学习在图级、子图级和节点级设置下,对 GNN 性能有何影响?
  • RQ2非IID 数据分区对联邦 GNN 准确性的影响,与集中型训练相比如何?
  • RQ3在图FL下,哪些 GNN 架构和 FL 算法最具鲁棒性,或最少出现准确率下降?
  • RQ4在大规模图数据集上,FedGraphNN 的系统效率和安全特性是什么?
  • RQ5在图FL中仍存在哪些挑战,需要进一步的方法学与基准改进?

主要发现

  • 在较大且非IID的图数据集上,联邦GNNs 常常不如集中GNNs,而在较小数据集上的结果可能相当。
  • 最好的集中模型不一定等同于最佳的FL模型,体现出图上的独特FL动态。
  • 在图级FL中,GAT 经常显示出更大的准确率差距,而某些数据集(例如 CIAO、CORA、PubMed)在某些情况下在子图级或节点级FL下可超越集中训练。
  • FedGraphNN 通过 LightSecAgg 展现出计算效率和安全性,在聚合速度上比某些基线更快,从而实现隐私保护。
  • 训练时间取决于图大小,从几分钟到大约一小时不等,且安全聚合对隐私保障与 SecAgg 变体相当。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。