Skip to main content
QUICK REVIEW

[论文解读] GraphFL: A Federated Learning Framework for Semi-Supervised Node Classification on Graphs

Binghui Wang, Ang Li|arXiv (Cornell University)|Dec 8, 2020
Privacy-Preserving Technologies in Data参考文献 44被引用 34
一句话总结

GraphFL 是首个针对图的联邦半监督节点分类框架,通过元学习启发的方法和自训练,解决非IID客户端数据、新标签域和未标记数据,并优于标准 FL 基线。

ABSTRACT

Graph-based semi-supervised node classification (GraphSSC) has wide applications, ranging from networking and security to data mining and machine learning, etc. However, existing centralized GraphSSC methods are impractical to solve many real-world graph-based problems, as collecting the entire graph and labeling a reasonable number of labels is time-consuming and costly, and data privacy may be also violated. Federated learning (FL) is an emerging learning paradigm that enables collaborative learning among multiple clients, which can mitigate the issue of label scarcity and protect data privacy as well. Therefore, performing GraphSSC under the FL setting is a promising solution to solve real-world graph-based problems. However, existing FL methods 1) perform poorly when data across clients are non-IID, 2) cannot handle data with new label domains, and 3) cannot leverage unlabeled data, while all these issues naturally happen in real-world graph-based problems. To address the above issues, we propose the first FL framework, namely GraphFL, for semi-supervised node classification on graphs. Our framework is motivated by meta-learning methods. Specifically, we propose two GraphFL methods to respectively address the non-IID issue in graph data and handle the tasks with new label domains. Furthermore, we design a self-training method to leverage unlabeled graph data. We adopt representative graph neural networks as GraphSSC methods and evaluate GraphFL on multiple graph datasets. Experimental results demonstrate that GraphFL significantly outperforms the compared FL baseline and GraphFL with self-training can obtain better performance.

研究动机与目标

  • 推动在图结构数据上的半监督节点分类(GraphSSC)中的联邦学习,以保护隐私并降低标注成本。
  • 解决图结构数据中来自客户端的非IID数据问题。
  • 使对具有新标签域的测试节点具有泛化能力。
  • 通过自训练利用未标记节点以提升性能。

提出的方法

  • 将模型无关元学习(MAML)引入联邦学习,创建一个对非IID图数据具有泛化能力的全局模型。
  • 阶段 I(类似 MAML):通过模拟任务特定更新并在客户端查询集上评估,在服务器上学习全局初始化。
  • 阶段 II(FL 微调):让客户端对全局初始化进行微调,服务器通过 FedAvg 聚合以生成鲁棒的全局模型。
  • 对于新标签域,在 FL 内重新表述目标,以学习一个共享初始化,能够在少量标记样本下快速适应新标签域。
  • 自训练:每个客户端在其标记数据上训练,预测未标记节点,选择高置信度的伪标签,并为进一步的联邦学习扩充训练数据。

实验结果

研究问题

  • RQ1GraphFL 是否能够在联邦 GraphSSC 中缓解图数据的非IID 问题?
  • RQ2GraphFL 是否能够在不从头重新训练的情况下,对具有新标签域的测试节点进行泛化?
  • RQ3利用未标记节点的自训练是否在联邦图半监督学习中提升性能?
  • RQ4在非IID 和标签域转移场景下,GraphFL 与标准 FL 基线在基准图数据集上的表现如何?

主要发现

  • 在客户端标签高度非IID 时,GraphFL 始终优于标准 FL 基线。
  • GraphFL 对具有新标签域的测试节点具备比传统 FL 方法更好的泛化能力。
  • GraphFL 结合自训练在非自训练变体基础上实现了进一步的性能提升。
  • 在多组图数据集上的实验结果表明,所提出的框架在 GCN 和 SGC 主背后提升了节点分类精度。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。