QUICK REVIEW

[论文解读] Attention-based Graph Neural Network for Semi-supervised Learning

Kiran Koshy Thekumparampil, Chong Wang|arXiv (Cornell University)|Mar 10, 2018

Advanced Graph Neural Networks参考文献 42被引用 250

一句话总结

论文提出 AGNN，一种基于注意力的图神经网络，用自适应邻居注意力替代传统传播，在引用网络上实现最先进的结果，同时降低模型复杂度。

ABSTRACT

Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a linear model, that removes all the intermediate fully-connected layers, is still able to achieve a performance comparable to the state-of-the-art models. This significantly reduces the number of parameters, which is critical for semi-supervised learning where number of labeled examples are small. This in turn allows a room for designing more innovative propagation layers. Based on this insight, we propose a novel graph neural network that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph. The attention mechanism allows us to learn a dynamic and adaptive local summary of the neighborhood to achieve more accurate predictions. In a number of experiments on benchmark citation networks datasets, we demonstrate that our approach outperforms competing methods. By examining the attention weights among neighbors, we show that our model provides some interesting insights on how neighbors influence each other.

研究动机与目标

在具有有限标签的图上，利用图结构与特征来推动半监督学习的动机。
展示线性传播基线（GLN）也能达到 GCN 的性能，强调传播层的重要性。
提出 AGNN，通过对邻居的自适应注意力来提升准确性与可解释性。
证明 AGNN 在标准引用网络数据集上相较于最先进的方法具有更高的准确性。
提供对学习得到的注意力权重如何反映邻居影响的洞见。

提出的方法

分析图神经网络，发现传播层在性能上起主导作用，而中间的非线性层贡献较小。
定义 Graph Linear Network (GLN) 以将传播与非线性部分隔离，并显示其达到或接近 GCN 的性能。
引入 AGNN，在每一层只有一个标量参数，使用带权传播：H^{(t+1)} = P^{(t)} H^{(t)}，其中 P^{(t)}_{ij} ∝ exp(β^{(t)} cos(H_i^{(t)}, H_j^{(t)}))，行和为1（对邻居进行 softmax）。
通过初始嵌入 XW^{(0)} 及 ReLU 得到节点表示，随后进行 ell 次传播层，最终用 softmax 分类器 Z = softmax(H^{(ℓ+1)} W^{(1)})。
用带标签节点的交叉熵损失来训练所有权重（W^{(0)}、W^{(1)}、β^{(t)}）。
复杂度给出：O(ℓ d_h |E| + d_x d_h n)。

实验结果

研究问题

RQ1简化的线性传播模型是否能够在基于图的半监督学习任务上与当前的 GNNs 相媲美？
RQ2基于注意力的传播层是否通过识别并对图结构数据中更相关的邻居进行加权来提高准确性？
RQ3所提出的 AGNN 是否通过学习得到的注意力权重提供可解释性，揭示邻居影响模式？
RQ4与 GCN 及其他基线相比，AGNN 在标准引用网络基准（CiteSeer、Cora、PubMed）上的表现如何？

主要发现

GLN（线性传播）在基准引用网络上达到的准确率与最佳 GCN 相当或更好。
AGNN 在 CiteSeer、Cora 和 PubMed 的固定分割实验中达到最佳准确性，且提升超过标准误差。
在随机拆分和更大标签数据规模下，AGNN 始终优于最先进的基线方法。
注意力权重揭示来自同一类别的邻居往往获得更高的注意力，提供一定程度的可解释性。
由于模型复杂度降低且缺少深层非线性层，更深的传播（ℓ 高达4）对 AGNN 是可行且有益的。
注意力机制聚焦于与目标节点最相关的邻居，提升分类效果，即使对 GCN 未正确分类的节点也有改进。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。