QUICK REVIEW

[论文解读] A Generalization of Transformer Networks to Graphs

Vijay Prakash Dwivedi, Xavier Bresson|arXiv (Cornell University)|Dec 17, 2020

Advanced Graph Neural Networks参考文献 37被引用 321

一句话总结

这篇论文将 Transformer 架构推广到任意图，通过结合图稀疏性、拉普拉斯特征向量位置编码、批量归一化以及可选的边特征处理，在图基准数据集上展示了具有竞争力的性能。

ABSTRACT

We propose a generalization of transformer neural network architecture for arbitrary graphs. The original transformer was designed for Natural Language Processing (NLP), which operates on fully connected graphs representing all connections between the words in a sequence. Such architecture does not leverage the graph connectivity inductive bias, and can perform poorly when the graph topology is important and has not been encoded into the node features. We introduce a graph transformer with four new properties compared to the standard model. First, the attention mechanism is a function of the neighborhood connectivity for each node in the graph. Second, the positional encoding is represented by the Laplacian eigenvectors, which naturally generalize the sinusoidal positional encodings often used in NLP. Third, the layer normalization is replaced by a batch normalization layer, which provides faster training and better generalization performance. Finally, the architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs). Numerical experiments on a graph benchmark demonstrate the performance of the proposed graph transformer architecture. This work closes the gap between the original transformer, which was designed for the limited case of line graphs, and graph neural networks, that can work with arbitrary graphs. As our architecture is simple and generic, we believe it can be used as a black box for future applications that wish to consider transformer and graphs.

研究动机与目标

激励将 Transformer 适应于利用图结构和在任意图中的归纳偏置。
引入 Graph Transformer 层，关注局部图邻域而非全连通性。
将基于拉普拉斯特征向量的定位编码整合到输入节点特征中，以捕捉图中节点位置。
提供一个支持边特征的架构变体，以利用成对边信息。
展示在标准图基准上与 GNN 基线相比的竞争性能。

提出的方法

将节点和边特征通过线性投影嵌入到一个共同的隐藏维度。
将基于拉普拉斯特征向量的定位编码加入输入节点特征。
计算多头注意力，其中每个头在局部邻居上进行 softmax，关注局部邻居。
在 FFN 周围包含残差连接和归一化（BatchNorm 或 LayerNorm）。
提供一个 Graph Transformer 变体，联合更新节点和边表示，并使用专门的 FFN。
在 ZINC、PATTERN 和 CLUSTER 数据集上评估，包含稀疏图和全图设置。

实验结果

研究问题

RQ1能否将类似 Transformer 的注意力机制有效地局部化到图的邻域以利用稀疏性？
RQ2拉普拉斯特征向量位置编码是否相比其他位置编码在图任务上提升性能？
RQ3将层归一化替换为批量归一化是否能提升训练和泛化在图 Transformer？
RQ4将边特征纳入 Graph Transformer 是否能在具有丰富边信息的数据集上提升性能？

主要发现

具有拉普拉斯 PE 和 BatchNorm 的 Graph Transformer 在三个数据集上均优于基线的各向同性和各向异性 GNN。
稀疏图配置的性能优于全图，验证了稀疏性归纳偏差。
Graph Transformer 与边特征在 ZINC 上与最佳 GNN（GatedGCN）的性能差距很小。
基于 LapPExam 的编码（Laplacian PEs）在这些任务中优于 WL-PE 和接近性编码。
使用 BatchNorm 而非 LayerNorm 通常能提升训练效率和泛化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。