QUICK REVIEW

[论文解读] Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights

Xingtong Yu, S. Ye|arXiv (Cornell University)|Feb 28, 2026

Advanced Graph Neural Networks被引用 0

一句话总结

本论文提出一个二维基准，联合评估图基础模型（GFMs）的主题与格式域转移，并在33个数据集和四种评估设置下分析八个GFMs。

ABSTRACT

Graph foundation models (GFM) aim to acquire transferable knowledge by pre-training on diverse graphs, which can be adapted to various downstream tasks. However, domain shift in graphs is inherently two-dimensional: graphs differ not only in what they describe (topic domains) but also in how they are represented (format domains). Most existing GFM benchmarks vary only topic domains, thereby obscuring how knowledge transfers across both dimensions. We present a new benchmark that jointly evaluates topic and format gaps across the full GFM pipeline, including multi-domain self-supervised pre-training and few-shot downstream adaptation, and provides a timely evaluation of recent GFMs in the rapidly evolving landscape. Our protocol enables controlled assessment in four settings: (i) pre-training on diverse topics and formats, while adapting to unseen downstream datasets; (ii) same pre-training as in (i), while adapting to seen datasets; (iii) pre-training on a single topic domain, while adapting to other topics; (iv) pre-training on a base format, while adapting to other formats. This two-axis evaluation disentangles semantic generalization from robustness to representational shifts. We conduct extensive evaluations of eight state-of-the-art GFMs on 33 datasets spanning seven topic domains and six format domains, surfacing new empirical observations and practical insights for future research. Codes/data are available at https://github.com/smufang/GFMBenchmark.

研究动机与目标

通过将主题语义与图格式分离，形式化图域的二维视角。
构建覆盖主题与格式域的综合基准，适用于GFMs。
提供统一的评估设置，以评估已见与未见下游迁移。
分析最先进GFMs的泛化行为并提供可操作的设计洞见。

提出的方法

为图定义主题域与格式域，并在两轴上 curate 多样数据集。
在多样的主题/格式图上，对GFMs进行多域自监督目标的预训练。
通过少样本下游自适应，对节点/边/图任务进行跨域迁移评估。
在四种评估设置下，比较包括八种模型在内的广泛GFMs。
标准化数据预处理与评估协议，以实现公平比较。

实验结果

研究问题

RQ1RQ1: 在多域预训练后，针对未见下游数据集，在多域主题与格式上训练的GFMs能否适应？
RQ2RQ2: 在多域预训练后，GFMs在已见下游数据集上的适应性能如何？
RQ3RQ3: 语义（主题）泛化如何与表征（格式）转移中的表示偏移相互作用？
RQ4RQ4: 在对基格式进行预训练后，GFMs在不同图格式上的泛化程度有多大？

主要发现

没有任何单一GFM在所有未见数据集上占据支配地位；性能取决于数据集和任务。
GFMs总体上在未见目标上优于传统有监督GNNs，但在不同数据集上的提升不一致。
一些GFMs（如SAMGPT、MDGPT、GFT、MDGFM）在多种设置中经常具备竞争力，而其他模型在存在文本标签时表现出色。
评估揭示了不同的泛化行为和局限性，强调需要改进的多域整合与自适应策略。
在可用文本标签时，某些方法（如G2P2、GraphCLIP）在特定任务上可以提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。