QUICK REVIEW

[论文解读] Self-Supervised Graph Transformer on Large-Scale Molecular Data

Yu Rong, Yatao Bian|arXiv (Cornell University)|Jun 18, 2020

Computational Drug Discovery Methods参考文献 62被引用 415

一句话总结

GROVER 在 1000 万个无标签分子上通过自监督任务预训练图转换器，在微调后在 11 个 MoleculeNet 基准上获得显著提升（平均>6%）。

ABSTRACT

How to obtain informative representations of molecules is a crucial prerequisite in AI-driven drug design and discovery. Recent researches abstract molecules as graphs and employ Graph Neural Networks (GNNs) for molecular representation learning. Nevertheless, two issues impede the usage of GNNs in real scenarios: (1) insufficient labeled molecules for supervised training; (2) poor generalization capability to new-synthesized molecules. To address them both, we propose a novel framework, GROVER, which stands for Graph Representation frOm self-superVised mEssage passing tRansformer. With carefully designed self-supervised tasks in node-, edge- and graph-level, GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. Rather, to encode such complex information, GROVER integrates Message Passing Networks into the Transformer-style architecture to deliver a class of more expressive encoders of molecules. The flexibility of GROVER allows it to be trained efficiently on large-scale molecular dataset without requiring any supervision, thus being immunized to the two issues mentioned above. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning. We then leverage the pre-trained GROVER for molecular property prediction followed by task-specific fine-tuning, where we observe a huge improvement (more than 6% on average) from current state-of-the-art methods on 11 challenging benchmarks. The insights we gained are that well-designed self-supervision losses and largely-expressive pre-trained models enjoy the significant potential on performance boosting.

研究动机与目标

在有限标注数据和庞大化学空间下，激发对鲁棒分子表征的需求。
提出一种用于分子图的自监督预训练框架，以提高泛化能力。
设计一个基于 Transformer 的编码器，具备图感知注意力与动态图信息传递。
证明在大规模无标签数据上的预训练能够提升下游分子性质预测。

提出的方法

介绍 GROVER：一种来自自监督信息传递 Transformer 的图表示，具有节点和边的 GNN Transformer。
使用双层信息提取：基于 GNN 的查询/键/值喂入一个覆盖所有节点的 Transformer 编码器。
实现带随机跳数的动态信息传递（dyMPN），以提高泛化能力。
设计节点/边上下文属性预测作为节点/边级自监督。
通过 RDKit 检测的基序作为多标签目标，为图表示添加图级基序预测。
在 11M 个无标签分子（ZINC15、ChEMBL）上进行预训练，模型规模为 100M 参数，跨 250 个 GPU。

实验结果

研究问题

RQ1自监督的预训练在大规模无标签分子图上是否能在微调后提升下游性质预测？
RQ2具备图感知的 Transformer 编码器和动态信息传递相比传统 GNNs 是否能提供更优的表征？
RQ3上下文感知的节点/边和基序驱动的图级预文本任务对性能和泛化有何影响？
RQ4GROVER 在 MoleculeNet 基准上的模型规模和训练数据如何扩展？

主要发现

GROVER 模型在所有 11 个数据集上都实现了最佳性能，平均相对提升为 6.1%（分类 2.2%，回归 10.8%）。
GROVER_large 在所有数据集上超越了最先进的基线；GROVER_base 在 8/11 数据集上超越。
自监督预训练在分类任务上相较于无预训练平均提升 AUC 3.8%，特别有助于小数据集。
在消融实验中，GROVER 采用的 GTransformer 主干比 GIN 与 MPNN 主干表现更强，证实了更高的表达能力；dyMPN 尽管对训练损失有轻微影响，却提升了泛化。
GROVER 在小标签场景（如 FreeSolv）取得显著提升，相对于 SOTA 的相对提升为 23.9%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。