[论文解读] A simple yet effective baseline for non-attributed graph classification
本论文提出 Local Degree Profile (LDP),一种基于局部度分布的简单、线性时间图表示,在非属性图上与最先进的图核和图神经网络等竞争,且对带属性图的基线仍然强劲。
Graphs are complex objects that do not lend themselves easily to typical learning tasks. Recently, a range of approaches based on graph kernels or graph neural networks have been developed for graph classification and for representation learning on graphs in general. As the developed methodologies become more sophisticated, it is important to understand which components of the increasingly complex methods are necessary or most effective. As a first step, we develop a simple yet meaningful graph representation, and explore its effectiveness in graph classification. We test our baseline representation for the graph classification task on a range of graph datasets. Interestingly, this simple representation achieves similar performance as the state-of-the-art graph kernels and graph neural networks for non-attributed graph classification. Its performance on classifying attributed graphs is slightly weaker as it does not incorporate attributes. However, given its simplicity and efficiency, we believe that it still serves as an effective baseline for attributed graph classification. Our graph representation is efficient (linear-time) to compute. We also provide a simple connection with the graph neural networks. Note that these observations are only for the task of graph classification while existing methods are often designed for a broader scope including node embedding and link prediction. The results are also likely biased due to the limited amount of benchmark datasets available. Nevertheless, the good performance of our simple baseline calls for the development of new, more comprehensive benchmark datasets so as to better evaluate and analyze different graph learning methods. Furthermore, given the computational efficiency of our graph summary, we believe that it is a good candidate as a baseline method for future graph classification (or even other graph learning) studies.
研究动机与目标
- 评估一个简单的基于局部信息的图表示在非属性图分类中能否表现良好。
- 将 LDP 基线与在标准数据集上的最先进图核和图神经网络进行比较。
- 评估所提出方法作为图分类基线的计算效率和可扩展性。
提出的方法
- 对每个节点 v 计算:degree(v) 以及其邻居的度数的统计量(最小、最大、均值、标准差)(DN(v))的度数。
- 通过对每个五个节点特征应用直方图或经验分布函数来创建图级特征,然后在特征维度上进行拼接。
- 在聚合的图特征上训练线性或非线性支持向量机,进行十折交叉验证并重复十次,报告平均准确度。
- 分析计算复杂度:特征提取是 O(E),将 V 的值映射到 B 个区间是 O(V);并与基于核的方法和神经网络基线进行比较。
- 通过展示 LDP 在不学习的情况下捕捉到 GNN 的基本要素,讨论与图神经网络的关系,并考虑添加潜在的额外特征如 sum(DN(v))(未在最终结果中部署)。
- 超参数包括区间大小、归一化策略、表示方式(直方图与经验分布)、尺度选择(线性与对数)、以及 SVM 的 C 与核带宽参数。
实验结果
研究问题
- RQ1一个简单的、非学习的局部特征表示是否能在非属性图分类任务中与复杂的图核和 GNN 相抗衡?
- RQ2在标准的非属性图数据集上,LDP 基线在准确性和效率方面与最先进方法相比如何?
- RQ3仅使用局部、非属性信息进行图分类有哪些局限性?在何种情形下需要全局信息或属性信息?
主要发现
- 本地度分布(LDP)基线在非属性图分类任务上实现了与最先进的图核以及许多图神经网络相当的性能。
- 即使使用线性 SVM(不学习表示),LDP 在若干数据集上表现良好,包括 Reddit 变体数据集。
- 增加额外的节点或边特征会带来有限的、整数据集层面的提升,表明纯局部基于度数的特征对非属性图可能有出人意料的强大效果;而对于标注度较高的数据集(例如某些化学图),可能需要更全局或属性信息。
- LDP 的特征提取具有线性时间的计算效率,凸显其作为未来图分类研究的强基线的适用性,以及需要更大、更加全面的基准数据集。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。