QUICK REVIEW

[论文解读] Do Transformers Really Perform Bad for Graph Representation?

Chengxuan Ying, Tianle Cai|arXiv (Cornell University)|Jun 9, 2021

Advanced Graph Neural Networks参考文献 62被引用 126

一句话总结

Graphormer 演示了，普通 Transformer 架构配合面向图的结构编码，在主要图表示基准上实现了最先进的结果，包括 OGB-LSC。

ABSTRACT

The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excellent results on a broad range of graph representation learning tasks, especially on the recent OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model. To this end, we propose several simple yet effective structural encoding methods to help Graphormer better model graph-structured data. Besides, we mathematically characterize the expressive power of Graphormer and exhibit that with our ways of encoding the structural information of graphs, many popular GNN variants could be covered as the special cases of Graphormer.

研究动机与目标

证明 Transformers 在图表示学习上具备竞争力的能力。
引入结构编码，将图结构注入到 Transformer 注意力中。
提供理论分析，表明 Graphormer 的表达能力包含常见 GNN。
在大规模和标准图基准（如 OGB-LSC、MolPCBA、MolHIV 和 ZINC）上进行经验验证。

提出的方法

提出 Graphormer，一种针对图改造的标准 Transformer，具有三种结构编码：Centrality Encoding、Spatial Encoding 和 Edge Encoding。
Centrality Encoding 在输入时向节点特征添加基于度的嵌入，以 informing attention。
Spatial Encoding 通过从图基节点对关系（如最短路径距离）导出的学习标量 b_{φ(v_i,v_j)} 对注意力进行偏置。
Edge Encoding 将边特征沿最短路径纳入注意力，通过路径上的聚合偏置 c_{ij}。
引入一个与所有节点相连的特殊 [VNode] token，以支持图级 Readout，类似 NLP 模型中的 [CLS]。
使用带有 pre-LN 配置的 Transformer encoder 块，并通过 [VNode] 进行 Readout。
给出理论结果，表明 Graphormer 能模拟 GNN 的聚合/组合，并在提出的编码下超越 1-WL 的表达能力。

实验结果

研究问题

RQ1在经过图感知结构编码增强的普通 Transformer 是否可以在图级预测任务上赶上或超越传统 GNN？
RQ2Centrality、Spatial 和 Edge 编码如何影响 Graphormer 的性能与表征能力？
RQ3Graphormer 是否足够具备表达力，将常见的 GNN 变体（如 GCN、GIN）作为特例纳入？
RQ4在 ablation 后，Graphormer 在大规模基准（如 OGB-LSC PCQM4M-LSC）和标准基准（MolPCBA、MolHIV、ZINC）上的性能如何？

主要发现

Graphormer 采用所提出的编码在大规模与标准图基准上实现了最先进或具有竞争力的结果。
在 PCQM4M-LSC 上，Graphormer（full）实现了验证集 MAE 0.1234，明显优于许多 GNN 基线；Graphormer Small 在 0.1264 验证 MAE 也表现出色。
Graphormer 在 MolPCBA (AP 31.39 ±0.32) 和 MolHIV (AUC 80.51 ±0.53) 上通过 FLAG 增强超越了之前的 SOTA GNN。
在 ZINC 上，Graphormer-SLIM 实现 0.122 ±0.006 MAE，超越了若干传统 GNN 与基于 Transformer 的竞争者。
消融研究表明 Spatial Encoding 与 Centrality Encoding 显著提升性能；通过注意力偏置的边编码进一步带来增益。
实验表明，在适当的编码和权重下，许多流行的 GNN 变体可以作为 Graphormer 的特例被恢复。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。