QUICK REVIEW

[论文解读] Benchmarking Graph Neural Networks

Vijay Prakash Dwivedi, Chaitanya K. Joshi|arXiv (Cornell University)|Mar 2, 2020

Advanced Graph Neural Networks参考文献 100被引用 233

一句话总结

本文提出一个开源、模块化的基准框架，用于图神经网络（GNN），涵盖多样的12数据集集合，固定的参数预算以实现公平比较，探索基于拉普拉斯特征向量的图位置编码，并引入 AQSOL 数据集。

ABSTRACT

In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting.

研究动机与目标

在多样的真实世界和数学图上为 GNNs 建立社区标准、公平的基准框架。
提供一个模块化、可复现的代码库（PyTorch/DGL），在固定参数预算下实现公平比较。
扩展数据集集合，涵盖基本的数学图和具有真实世界目标的 AQSOL 分子数据集。
展示该框架如何推动对 GNN 设计的洞见，例如使用拉普拉斯特征向量的图位置编码（PE）。

提出的方法

引入一个基于 PyTorch 和 DGL 的模块化 GNN 基准框架，包含数据管道、GNN 层/模型、训练/评估和可重复性脚本。
提供一个涵盖真实世界和数学领域的中等规模图的 12 个数据集集合（表 1）。
实现两个参数预算（100k 和 500k），以使架构的公平比较独立于总参数数量。
通过将拉普拉斯特征向量附加到节点特征上来分析图位置编码（PE），演示框架的用法。
描述如何扩展框架以测试数据预处理、层和归一化方案的新想法。
讨论偏好中等规模数据集以实现快速、可靠原型设计的设计选择。

实验结果

研究问题

RQ1在不同图任务中，哪些 GNN 架构和构建块在固定参数预算下表现最佳？
RQ2在实际基准中，图位置编码如何影响 GNN 的性能和表达能力？
RQ3基准在多大程度上区分不同的 GNN 分类（MP-GCNs vs WL-GNNs）和跨图级、节点级、边级任务的数据集？
RQ4框架能否容纳并加速对新 GNN 点子、归一化方案以及池化机制的探索？

主要发现

该基准框架已被广泛用于原型化 GNN 点子，研究聚合、表达能力、池化、归一化和鲁棒性。
使用拉普拉斯特征向量的图位置编码提升了 MP-GCNs 在合成数据和真实世界数据集上的表现，包括 AQSOL 数据集。
该框架促进了推动后续关于 PE 及相关 GNN 增强的研究（例如 Beaini 等，2021；Wang 等，2022；Lim 等，2022；Kreuzer 等，2021；Ying 等，2021；Mialon 等，2021）。
更新后的框架增加了基本的数学数据集和 AQSOL 分子数据集以扩展评估场景。
GitHub 仓库获得了社区关注（2000+ 叉星，380+ 分叉），并在文献中被引用，说明了开源基础设施的实用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。