QUICK REVIEW

[论文解读] On the Bottleneck of Graph Neural Networks and its Practical Implications

Uri Alon, Eran Yahav|arXiv (Cornell University)|Jun 9, 2020

Advanced Graph Neural Networks被引用 148

一句话总结

本文提出GNN中的过度压缩瓶颈，显示随着图半径增大，长程信息无法有效传播，并证明通过使用一个全邻接层（FA）来打破瓶颈，在多个领域获得显著性能提升且无需额外调参。

ABSTRACT

Since the proposal of the graph neural network (GNN) by Gori et al. (2005) and Scarselli et al. (2008), one of the major problems in training GNNs was their struggle to propagate information between distant nodes in the graph. We propose a new explanation for this problem: GNNs are susceptible to a bottleneck when aggregating messages across a long path. This bottleneck causes the over-squashing of exponentially growing information into fixed-size vectors. As a result, GNNs fail to propagate messages originating from distant nodes and perform poorly when the prediction task depends on long-range interaction. In this paper, we highlight the inherent problem of over-squashing in GNNs: we demonstrate that the bottleneck hinders popular GNNs from fitting long-range signals in the training data; we further show that GNNs that absorb incoming edges equally, such as GCN and GIN, are more susceptible to over-squashing than GAT and GGNN; finally, we show that prior work, which extensively tuned GNN models of long-range problems, suffers from over-squashing, and that breaking the bottleneck improves their state-of-the-art results without any tuning or additional weights. Our code is available at https://github.com/tech-srl/bottleneck/ .

研究动机与目标

在需要长程信息时，动机化并形式化图神经网络中的过度压缩瓶颈。
表征哪些GNN架构对过度压缩更易受影响（GCN/GIN vs. GAT/GGNN）。
通过简单的完全相邻FA层打破瓶颈并在基准测试中评估其实际影响。
给出隐藏层尺寸随问题半径变化的理论下界并进行实证验证。

提出的方法

定义问题半径 r 和在 GNN 层中分析信息流的感受野 N_v^K。
引入 NeighborsMatch 作为受控的合成基准来量化 over-squashing 及其对训练准确性的影响。
在 NeighborsMatch 上对 GCN、GIN、GGNN 和 GAT 进行实验性地展示瓶颈；分析为何某些聚合器会更强烈地压缩信息。
通过将最终 GNN 层转换为完全相邻（FA）层并在不进行额外调参的情况下重新训练来打破瓶颈以评估影响。
将 FA 修改应用于真实世界基准（QM9、NCI1、ENZYMES、VarMisuse），使用重新实现的基线并报告性能提升。
给出在给定问题半径 r 下隐藏维度 d 的组合与经验下界，说明所需容量呈指数级增长。

实验结果

研究问题

RQ1Does GNNs' information bottleneck prevent learning long-range patterns in synthetic long-range tasks?
RQ2Which GNN architectures (GCN, GIN, GGNN, GAT) are more susceptible to over-squashing and why?
RQ3Can breaking the bottleneck via a simple FA layer significantly improve performance on long-range tasks without extra tuning?
RQ4Do empirical results on real datasets (QM9, ENZYMES, NCI1, VarMisuse) corroborate the existence of over-squashing and the efficacy of FA layers?
RQ5What are the theoretical limits on hidden dimension required to fit long-range signals as the problem radius grows?

主要发现

Property	base †	+ FA	base †	+ FA	base †	+ FA
mu	2.64 ± 0.11	2.54 ± 0.09	2.68 ± 0.06	2.73 ± 0.07	3.85 ± 0.16	3.53 ± 0.13
alpha	4.67 ± 0.52	2.28 ± 0.04	4.65 ± 0.44	2.32 ± 0.16	5.22 ± 0.86	2.72 ± 0.12
HOMO	1.42 ± 0.01	1.26 ± 0.02	1.48 ± 0.03	1.43 ± 0.02	1.67 ± 0.07	1.45 ± 0.04
LUMO	1.50 ± 0.09	1.34 ± 0.04	1.53 ± 0.07	1.41 ± 0.03	1.74 ± 0.06	1.63 ± 0.06
gap	2.27 ± 0.09	1.96 ± 0.04	2.31 ± 0.06	2.08 ± 0.05	2.60 ± 0.06	2.30 ± 0.05
R2	15.63 ± 1.40	12.61 ± 0.37	52.39 ± 42.5	15.76 ± 1.17	35.94 ± 35.7	14.33 ± 0.47
ZPVE	12.93 ± 1.81	5.03 ± 0.36	14.87 ± 2.88	5.98 ± 0.43	17.84 ± 3.61	5.24 ± 0.30
U0	5.88 ± 1.01	2.21 ± 0.12	7.61 ± 0.46	2.19 ± 0.25	8.65 ± 2.46	3.35 ± 1.68
U	18.71 ± 23.36	2.32 ± 0.18	6.86 ± 0.53	2.11 ± 0.10	9.24 ± 2.26	2.49 ± 0.34
H	5.62 ± 0.81	2.26 ± 0.19	7.64 ± 0.92	2.27 ± 0.29	9.35 ± 0.96	2.31 ± 0.15
G	5.38 ± 0.75	2.04 ± 0.24	6.54 ± 0.36	2.07 ± 0.07	7.14 ± 1.15	2.17 ± 0.29
Cv	3.53 ± 0.37	1.86 ± 0.03	4.11 ± 0.27	2.03 ± 0.14	8.86 ± 9.07	2.25 ± 0.20
Omega	1.05 ± 0.11	0.80 ± 0.04	1.48 ± 0.87	0.73 ± 0.04	1.57 ± 0.53	0.87 ± 0.09
Relative	-39.54%		-44.58%		-47.42%

On NeighborsMatch, GCN and GIN fail to fit data from r=4, and all models fail at r=5, evidencing over-squashing.
GAT and GGNN sustain better performance than GCN/GIN at larger radii due to attention-based and edge-filtering mechanisms that mitigate full-field compression.
Breaking the bottleneck with a single FA layer reduces error on QM9 by 42% on average across six GNN types; benefits also appear on ENZYMES, NCI1, and VarMisuse datasets without extra tuning.
Across QM9, replacing the last layer with FA yields substantial improvements (e.g., mu, alpha, HOMO, LUMO, gap, etc.), with relative error reductions ranging roughly around 12% on average for some datasets.
On biological benchmarks, FA layers yield average reductions in error (Enzymes ~8.1% over base; NCI1 ~1.5% over best baselines) and overall ~12% average improvement across ENZYMES and NCI1.
VarMisuse shows state-of-the-art gains, with SeenProjTest improving to 88.4% and UnseenProjTest to 83.8% using FA layers.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。