Skip to main content
QUICK REVIEW

[论文解读] Self-supervised Graph-level Representation Learning with Local and Global Structure

Minghao Xu, Hang Wang|arXiv (Cornell University)|Jun 8, 2021
Computational Drug Discovery Methods参考文献 59被引用 44
一句话总结

GraphLoG 提出一个自监督框架,使用在线 EM 算法和分层原型,在整图表征中同时学习局部实例相似性和全局层次语义结构,取得在化学和生物任务上的强劲下游表现。

ABSTRACT

This paper studies unsupervised/self-supervised whole-graph representation learning, which is critical in many tasks such as molecule properties prediction in drug and material discovery. Existing methods mainly focus on preserving the local similarity structure between different graph instances but fail to discover the global semantic structure of the entire data set. In this paper, we propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning. Specifically, besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters. An efficient online expectation-maximization (EM) algorithm is further developed for learning the model. We evaluate GraphLoG by pre-training it on massive unlabeled graphs followed by fine-tuning on downstream tasks. Extensive experiments on both chemical and biological benchmark data sets demonstrate the effectiveness of the proposed approach.

研究动机与目标

  • Motivate learning informative whole-graph representations in an unsupervised setting for domains like chemistry and biology.
  • Address the limitation of prior methods that capture only local structure by incorporating global Semantic clustering via hierarchical prototypes.
  • Propose GraphLoG, which jointly optimizes local and global objectives to learn robust graph embeddings.
  • Pre-train on massive unlabeled graphs and fine-tune on downstream tasks with scarce labels.

提出的方法

  • Use a GNN to obtain graph and subgraph embeddings from original and correlated (masked) graphs.
  • Define local-instance learning objectives to maximize similarity of correlated pairs and minimize negativity between non-correlated pairs (graph and subgraph level).
  • Introduce hierarchical prototypes organized as trees to capture global semantic structure in latent space.
  • Apply an online EM algorithm to jointly learn GNN parameters and prototypes by alternating E-steps (latent variable inference) and M-steps (maximizing expected complete-data likelihood).
  • Model the global objective with a Noise-Contrastive Estimation style unnormalized likelihood over (graph, prototype) pairs and negatives sampled from a noise distribution.

实验结果

研究问题

  • RQ1Can GraphLoG effectively capture both local-instance structure and global semantic clusters in unlabeled graph collections?
  • RQ2Do hierarchical prototypes improve the quality of global-structure representations and downstream task performance compared to existing self-supervised methods?
  • RQ3Is online EM a practical and effective optimization strategy for jointly learning GNN parameters and hierarchical prototypes on large graph datasets?

主要发现

  • GraphLoG achieves strong downstream performance, with Graph Isomorphism Network (GIN) pre-trained by GraphLoG outperforming prior self-supervised methods on six of eight chemistry tasks and gaining 2.1% average ROC-AUC.
  • In chemistry benchmarks, GraphLoG attains an average ROC-AUC of 73.4%, and outperforms several baselines on multiple tasks (e.g., HIV, BACE) as shown in Table 1.
  • In biology benchmarks, GraphLoG achieves 72.9% ROC-AUC, outperforming several baselines listed in Table 2.
  • Ablation studies and embedding visualizations corroborate the benefits of incorporating global hierarchical structure in addition to local similarity preservation.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。