QUICK REVIEW

[논문 리뷰] GraphFL: A Federated Learning Framework for Semi-Supervised Node Classification on Graphs

Binghui Wang, Ang Li|arXiv (Cornell University)|2020. 12. 08.

Privacy-Preserving Technologies in Data참고 문헌 44인용 수 34

한 줄 요약

GraphFL은 비IID 클라이언트 데이터, 새로운 레이블 도메인, 미라벨 데이터 문제를 메타러닝에서 영감을 받은 접근법과 자체 학습으로 해결하는 그래프에 대한 최초의 연합(연합학습) 반지도 학습 기반 노드 분류 프레임워크로, 표준 FL 기준선보다 성능이 향상된다.

ABSTRACT

Graph-based semi-supervised node classification (GraphSSC) has wide applications, ranging from networking and security to data mining and machine learning, etc. However, existing centralized GraphSSC methods are impractical to solve many real-world graph-based problems, as collecting the entire graph and labeling a reasonable number of labels is time-consuming and costly, and data privacy may be also violated. Federated learning (FL) is an emerging learning paradigm that enables collaborative learning among multiple clients, which can mitigate the issue of label scarcity and protect data privacy as well. Therefore, performing GraphSSC under the FL setting is a promising solution to solve real-world graph-based problems. However, existing FL methods 1) perform poorly when data across clients are non-IID, 2) cannot handle data with new label domains, and 3) cannot leverage unlabeled data, while all these issues naturally happen in real-world graph-based problems. To address the above issues, we propose the first FL framework, namely GraphFL, for semi-supervised node classification on graphs. Our framework is motivated by meta-learning methods. Specifically, we propose two GraphFL methods to respectively address the non-IID issue in graph data and handle the tasks with new label domains. Furthermore, we design a self-training method to leverage unlabeled graph data. We adopt representative graph neural networks as GraphSSC methods and evaluate GraphFL on multiple graph datasets. Experimental results demonstrate that GraphFL significantly outperforms the compared FL baseline and GraphFL with self-training can obtain better performance.

연구 동기 및 목표

프라이버시를 보호하고 라벨링 비용을 줄이기 위해 그래프 기반의 반지도 노드 분류(GraphSSC)에 연합 학습을 도입하는 동기를 부여한다.
그래프 구조 데이터에서 클라이언트 간의 비 IID 데이터를 다룬다.
새로운 레이블 도메인을 가진 테스트 노드에 대한 일반화를 가능하게 한다.
성능 향상을 위해 미라벨링되지 않은 노드를 자체 학습으로 활용한다.

제안 방법

연합 학습에 모델에 구애받지 않는 메타학습(MAML)을 도입하여 비 IID 그래프 데이터 전반에 일반화되는 전역 모델을 만든다.
1단계(MAML 유사): 서버에서 작업 특화 업데이트를 시뮬레이션하고 클라이언트 질의 세트로 평가하여 전역 초기화를 학습한다.
2단계(FL 미세조정): 클라이언트가 전역 초기화를 미세조정하도록 하고 서버는 FedAvg로 집계하여 강건한 전역 모델을 생성한다.
새로운 레이블 도메인에 대해서는 FL 내에서 목적 함수를 재정의하여 소수의 라벨 예제로도 새로운 레이블 도메인에 빠르게 적응하는 공유 초기화를 학습한다.
자체 학습: 각 클라이언트는 라벨링된 데이터로 학습하고, 라벨이 없는 노드를 예측하고, 신뢰도가 높은 의사 레이블을 선택하여 추가 연합 학습을 위한 학습 데이터를 확장한다.

실험 결과

연구 질문

RQ1GraphFL은 연합 GraphSSC 내에서 그래프 데이터의 비 IID 문제를 완화할 수 있는가?
RQ2GraphFL은 처음부터 재훈련하지 않고도 새로운 레이블 도메인을 가진 테스트 노드에 일반화할 수 있는가?
RQ3미라벨링되지 않은 노드를 활용한 자체 학습이 연합 그래프 반지도 학습의 성능을 향상시키는가?
RQ4비 IID 및 레이블 도메인 시프트 시나리오에서 벤치마크 그래프 데이터셋에 대해 GraphFL이 표준 FL 기준선과 어떻게 비교되는가?

주요 결과

클라이언트 레이블이 매우 비 IID일 때 GraphFL은 표준 FL 기준선을 일관되게 능가한다.
GraphFL은 전통적 FL 방법보다 새로운 레이블 도메인을 가진 테스트 노드에 대한 일반화가 더 잘 나타난다.
자체 학습을 포함한 GraphFL은 비-self-training 변형보다 추가적인 성능 향상을 보인다.
다수의 그래프 데이터셋에서의 실험 결과 제안된 프레임워크가 GCN과 SGC 백본에서 노드 분류 정확도를 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.