QUICK REVIEW

[Paper Review] GraphFL: A Federated Learning Framework for Semi-Supervised Node Classification on Graphs

Binghui Wang, Ang Li|arXiv (Cornell University)|Dec 8, 2020

Privacy-Preserving Technologies in Data44 references34 citations

TL;DR

GraphFL is the first federated semi-supervised node classification framework for graphs that addresses non-IID client data, new label domains, and unlabeled data via a meta-learning–inspired approach and self-training, improving over standard FL baselines.

ABSTRACT

Graph-based semi-supervised node classification (GraphSSC) has wide applications, ranging from networking and security to data mining and machine learning, etc. However, existing centralized GraphSSC methods are impractical to solve many real-world graph-based problems, as collecting the entire graph and labeling a reasonable number of labels is time-consuming and costly, and data privacy may be also violated. Federated learning (FL) is an emerging learning paradigm that enables collaborative learning among multiple clients, which can mitigate the issue of label scarcity and protect data privacy as well. Therefore, performing GraphSSC under the FL setting is a promising solution to solve real-world graph-based problems. However, existing FL methods 1) perform poorly when data across clients are non-IID, 2) cannot handle data with new label domains, and 3) cannot leverage unlabeled data, while all these issues naturally happen in real-world graph-based problems. To address the above issues, we propose the first FL framework, namely GraphFL, for semi-supervised node classification on graphs. Our framework is motivated by meta-learning methods. Specifically, we propose two GraphFL methods to respectively address the non-IID issue in graph data and handle the tasks with new label domains. Furthermore, we design a self-training method to leverage unlabeled graph data. We adopt representative graph neural networks as GraphSSC methods and evaluate GraphFL on multiple graph datasets. Experimental results demonstrate that GraphFL significantly outperforms the compared FL baseline and GraphFL with self-training can obtain better performance.

Motivation & Objective

Motivate federated learning for graph-based semi-supervised node classification (GraphSSC) to protect privacy and reduce labeling costs.
Address non-IID data across clients in graph-structured data.
Enable generalization to testing nodes with new label domains.
Leverage unlabeled nodes through self-training to improve performance.

Proposed method

Incorporate model-agnostic meta-learning (MAML) into federated learning to create a global model that generalizes across non-IID graph data.
Stage I (MAML-like): learn a global initialization on the server by simulating task-specific updates and evaluating on client query sets.
Stage II (FL fine-tuning): have clients fine-tune the global initialization and the server aggregates via FedAvg to produce a robust global model.
For new label domains, reformulate the objective within FL to learn a shared initialization that fast-adapts to new label domains with a few labeled examples.
Self-training: each client trains on its labeled data, predicts unlabeled nodes, selects high-confidence pseudo-labels, and augments training data for further federated learning.

Experimental results

Research questions

RQ1Can GraphFL mitigate non-IID issues in graph data within federated GraphSSC?
RQ2Can GraphFL generalize to testing nodes with new label domains without retraining from scratch?
RQ3Does self-training leveraging unlabeled nodes improve performance in federated graph semi-supervised learning?
RQ4How does GraphFL compare to standard FL baselines on benchmark graph datasets under non-IID and label-domain shift scenarios?

Key findings

GraphFL consistently outperforms standard FL baselines when client labels are highly non-IID.
GraphFL demonstrates better generalization to testing nodes with new label domains than traditional FL methods.
GraphFL with self-training yields further performance gains over its non-self-training variant.
Experimental results on multiple graph datasets show the proposed framework improves node classification accuracy across GCN and SGC backbones.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.