[论文解读] A Fair Comparison of Graph Neural Networks for Graph Classification
本论文针对五种 GNN 架构在图分类任务上进行大规模、标准化、可重复的公平评估,覆盖九个数据集,包括结构无关基线,以评估来自图拓扑的真实增益以及度特征的影响。
Experimental reproducibility and replicability are critical topics in machine learning. Authors have often raised concerns about their lack in scientific publications to improve the quality of the field. Recently, the graph representation learning field has attracted the attention of a wide research community, which resulted in a large stream of works. As such, several Graph Neural Network models have been developed to effectively tackle graph classification. However, experimental procedures often lack rigorousness and are hardly reproducible. Motivated by this, we provide an overview of common practices that should be avoided to fairly compare with the state of the art. To counter this troubling trend, we ran more than 47000 experiments in a controlled and uniform framework to re-evaluate five popular models across nine common benchmarks. Moreover, by comparing GNNs with structure-agnostic baselines we provide convincing evidence that, on some datasets, structural information has not been exploited yet. We believe that this work can contribute to the development of the graph learning field, by providing a much needed grounding for rigorous evaluations of graph classification models.
研究动机与目标
- Highlight reproducibility issues in GNN graph classification studies and establish a standardized evaluation framework.
- Re-evaluate five popular GNN architectures under the same data splits and identical node features.
- Assess how much structural information contributes beyond node features by using structure-agnostic baselines.
- Investigate the impact of including node degree features on performance and model depth for social graphs.
- Provide publicly available code and dataset splits to enable rigorous future comparisons.
提出的方法
- Review common reproducibility pitfalls in graph classification studies and define a rigorous evaluation protocol.
- Use 10-fold cross-validation for model assessment with an inner 90/10 split for model selection.
- Employ identical input features across models and compare against two structure-agnostic baselines.
- Re-implement five GNN models (DGCNN, DiffPool, ECC, GIN, GraphSAGE) in PyTorch Geometric for fair comparison.
- Evaluate on nine datasets (4 chemical, 5 social) and report mean accuracy with standard deviations.
- Release code and data splits to enable replication.
实验结果
研究问题
- RQ1To what extent do state-of-the-art GNNs outperform simple structure-agnostic baselines across graph classification benchmarks?
- RQ2How much of the performance gains come from graph structure versus node features?
- RQ3Does including node degree as an input feature consistently improve results on social graphs and affect the required model depth?
- RQ4Are there datasets where current GNNs fail to beat structure-agnostic baselines, indicating underutilization of topology?
主要发现
| D&D | NCI1 | PROTEINS | ENZYMES | |
|---|---|---|---|---|
| Baseline | 78.4±4.5 | 69.8±2.2 | 75.8±3.7 | 65.2±6.4 |
| DGCNN | 76.6±4.3 | 76.4±1.7 | 72.9±3.5 | 38.9±5.7 |
| DiffPool | 75.0±3.5 | 76.9±1.9 | 73.7±3.5 | 59.5±5.6 |
| ECC | 72.6±4.1 | 76.2±1.4 | 72.3±3.4 | 29.5±8.2 |
| GIN | 75.3±2.9 | 80.0±1.4 | 73.3±4.0 | 59.6±4.5 |
| GraphSAGE | 72.9±2.0 | 76.0±1.8 | 73.0±4.5 | 58.2±6.0 |
- On several chemical datasets (D&D, PROTEINS, ENZYMES), structure-agnostic baselines match or outperform GNNs.
- On NCI1, GNNs clearly exploit graph structure by outperforming the baseline.
- In social datasets, adding node degree features generally improves performance, and can reduce the needed number of layers for some models.
- GIN performs strongly on social datasets, while on some chemical datasets the baselines remain competitive.
- Including degree features can substantially boost baseline performance, and can alter the relative ranking of models.
- The study emphasizes the importance of baselines for fair assessment and reproducibility in graph classification.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。