QUICK REVIEW

[论文解读] Less is More: on the Over-Globalizing Problem in Graph Transformers

Yujie Xing, Xiao Wang|arXiv (Cornell University)|May 2, 2024

Graph Theory and Algorithms被引用 9

一句话总结

论文研究了图 Transformers 的过度全球化问题，分析了全球注意力为何会损害节点表示，并提出 CoBFormer——一个具有协同训练的双层全局变换器，能够解耦簇内和簇间信息并与本地 GCN 融合以提升泛化。

ABSTRACT

Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property always benefit Graph Transformers? We reveal the over-globalizing problem in Graph Transformer by presenting both empirical evidence and theoretical analysis, i.e., the current attention mechanism overly focuses on those distant nodes, while the near nodes, which actually contain most of the useful information, are relatively weakened. Then we propose a novel Bi-Level Global Graph Transformer with Collaborative Training (CoBFormer), including the inter-cluster and intra-cluster Transformers, to prevent the over-globalizing problem while keeping the ability to extract valuable information from distant nodes. Moreover, the collaborative training is proposed to improve the model's generalization ability with a theoretical guarantee. Extensive experiments on various graphs well validate the effectiveness of our proposed CoBFormer.

研究动机与目标

揭示图形变换器中的过度全球化现象及其对节点分类的影响。
从理论上将注意力分布、邻域有用性和嵌入光滑性联系起来。
提出一种双层全局架构（BGA），解耦簇内与簇间信息并减轻过度全球化。
引入一个 GCN 与 BGA 模块的协同训练以提升泛化能力。
在同质和异质图上实证验证 CoBFormer 并分析效率。

提出的方法

通过 METIS 将图划分为簇，以实现簇内和簇间处理。
应用簇内 Transformer 捕获局部簇级信息。
应用簇间 Transformer 捕获全局簇对簇信息并近似全局注意力。
将节点表示与相应簇表示以及融合层进行融合。
引入一个 GCN 作为本地模块，并通过两个线性头的协同训练来监督两个模块并实现相互精炼。

Less is More: on the Over-Globalizing Problem in Graph Transformers

实验结果

研究问题

RQ1全局注意力机制是否足够聚焦于有信息的（近距离）节点，还是对远距离节点过度强调？
RQ2如何在避免过度全球化的同时保留有用的远距离信息？
RQ3将双层注意力方案与协同训练结合，是否能在多样化图类型（同质和异质）上提高泛化性和效率？

主要发现

数据集	Mi-F1 (GCN)	Mi-F1 (GAT)	Mi-F1 (NodeFormer)	Mi-F1 (NAGphormer)	Mi-F1 (SGFormer)	Mi-F1 (CoB-G)	Mi-F1 (CoB-T)	Ma-F1 (GCN)	Ma-F1 (GAT)	Ma-F1 (NodeFormer)	Ma-F1 (NAGphormer)	Ma-F1 (SGFormer)	Ma-F1 (CoB-G)	Ma-F1 (CoB-T)
Cora	81.44 ± 0.78	81.88 ± 0.99	80.30 ± 0.66	79.62 ± 0.25	81.48 ± 0.94	84.96 ± 0.34	85.28 ± 0.16	81.44	83.78	83.82	81.54	83.68	84.96	85.28
CiteSeer	71.84 ± 0.22	72.26 ± 0.97	71.58 ± 1.74	67.46 ± 1.33	71.96 ± 0.13	74.68 ± 0.33	74.52 ± 0.48	69.87	70.44	70.90	69.60	71.20	74.68	74.52

经验和理论证据表明，标准图 Transformers 对远距离节点的注意力过度，削弱了近距离、潜在有信息的节点。
扩大感受野会增加嵌入错位（Z − ÂZ）并在多种设置下降级节点分类性能。
提出的双层全局图变换器（CoBFormer）具有簇内与簇间注意力，在保留全局信息的同时减少过度全球化。
GCN（本地）与 BGA 模块（全局）之间的协同训练提升泛化并带来更好的节点分类，理论支持来自 KL 散度分解。
与普通全局注意力方法相比，CoBFormer 在若干数据集上取得更优结果并显著降低 GPU 内存使用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。