QUICK REVIEW

[论文解读] Distance Encoding -- Design Provably More Powerful GNNs for Structural Representation Learning

Pan Li, Yanbang Wang|arXiv (Cornell University)|Aug 31, 2020

Advanced Graph Neural Networks参考文献 47被引用 23

一句话总结

本文提出距离编码（Distance Encoding, DE），一种通用的结构特征类别，通过编码目标节点集到图中所有其他节点的距离，增强图神经网络（GNN）的表达能力，使其超越1-Weisfeiler-Lehman（1-WL）测试的表达能力。该方法在节点角色预测、链接预测和三角形预测任务中表现更优，相较于标准GNN和SOTA基线模型，准确率和AUC最高提升15%。

ABSTRACT

Learning structural representations of node sets from graph-structured data is crucial for applications ranging from node-role discovery to link prediction and molecule classification. Graph Neural Networks (GNNs) have achieved great success in structural representation learning. However, most GNNs are limited by the 1-Weisfeiler-Lehman (WL) test and thus possible to generate identical representation for structures and graphs that are actually different. More powerful GNNs, proposed recently by mimicking higher-order-WL tests, only focus on entire-graph representations and cannot utilize sparsity of the graph structure to be computationally efficient. Here we propose a general class of structure-related features, termed Distance Encoding (DE), to assist GNNs in representing node sets with arbitrary sizes with strictly more expressive power than the 1-WL test. DE essentially captures the distance between the node set whose representation is to be learnt and each node in the graph, which includes important graph-related measures such as shortest-path-distance and generalized PageRank scores. We propose two general frameworks for GNNs to use DEs (1) as extra node attributes and (2) further as controllers of message aggregation in GNNs. Both frameworks may still utilize the sparse structure to keep scalability to process large graphs. In theory, we prove that these two frameworks can distinguish node sets embedded in almost all regular graphs where traditional GNNs always fail. We also rigorously analyze their limitations. Empirically, we evaluate these two frameworks on node structural roles prediction, link prediction and triangle prediction over six real networks. The results show that our models outperform GNNs without DEs by up-to 15% improvement in average accuracy and AUC. Our models also significantly outperform other SOTA baselines particularly designed for those tasks.

研究动机与目标

解决标准GNN的局限性，即受限于1-Weisfeiler-Lehman（1-WL）测试，无法区分结构上不同的图。
开发一种通用且可扩展的方法，通过可证明更高的表达能力来增强GNN在结构表征学习中的性能。
使GNN能够有效利用图的稀疏性，同时学习任意大小节点集的表征。
设计一种特征工程方法，通过距离度量捕捉全局结构上下文，同时不牺牲计算效率。

提出的方法

提出距离编码（DE）作为一组与结构相关的特征，通过最短路径距离和广义PageRank得分等度量方式，编码图中每个节点到目标节点集的距离。
将DE作为GNN中的额外节点属性，以增强节点表征的全局结构上下文信息。
通过将DE用作消息聚合的控制机制，扩展GNN的消息传递机制，实现动态、距离感知的信息流动。
通过保持图的稀疏结构，确保计算效率，从而实现对大规模图的可扩展性。
理论分析证明，两种集成DE的框架能够区分几乎所有1-WL基GNN失败的正则图中的节点集。
利用图同构理论形式化所提框架的表达能力，表明其在区分非同构图方面超越1-WL测试。

实验结果

研究问题

RQ1距离编码能否为基于1-WL的GNN提供一种可证明更具表达能力的替代方案，用于结构表征学习？
RQ2如何在保持大规模稀疏图计算效率的前提下，将DE集成到GNN中？
RQ3DE增强的GNN在节点角色预测、链接预测和三角形预测任务中的性能提升程度如何？
RQ4所提出的基于DE的框架在区分图结构方面存在哪些理论局限性？

主要发现

所提出的基于DE的GNN框架在六个真实世界网络上的节点结构角色预测、链接预测和三角形预测任务中，平均准确率和AUC最高提升15%。
DE增强的GNN显著优于缺乏结构编码的标准GNN，证明了距离感知特征在提升模型表达能力中的关键作用。
在特别具有挑战性的结构泛化任务中，模型超越了专为节点角色、链接和三角形预测设计的其他SOTA基线模型。
理论分析确认，两种框架均可区分几乎所有1-WL基GNN失败的正则图中的节点集，证明其更强的表达能力。
将DE作为节点属性或消息聚合控制器的集成方式，由于保持了图的稀疏性，因此维持了可扩展性。
实证结果表明，该方法在多种图类型和任务中均表现出一致的性能提升，验证了DE方法的通用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。