[论文解读] Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking
Graph2Gauss 在无监督、归纳框架中将节点嵌入学习为高斯分布,使用对多跳邻域的个性化排序目标来捕捉不确定性和网络结构。
Methods that learn representations of nodes in a graph play a critical role in network analysis since they enable many downstream learning tasks. We propose Graph2Gauss - an approach that can efficiently learn versatile node embeddings on large scale (attributed) graphs that show strong performance on tasks such as link prediction and node classification. Unlike most approaches that represent nodes as point vectors in a low-dimensional continuous space, we embed each node as a Gaussian distribution, allowing us to capture uncertainty about the representation. Furthermore, we propose an unsupervised method that handles inductive learning scenarios and is applicable to different types of graphs: plain/attributed, directed/undirected. By leveraging both the network structure and the associated node attributes, we are able to generalize to unseen nodes without additional training. To learn the embeddings we adopt a personalized ranking formulation w.r.t. the node distances that exploits the natural ordering of the nodes imposed by the network structure. Experiments on real world networks demonstrate the high performance of our approach, outperforming state-of-the-art network embedding methods on several different tasks. Additionally, we demonstrate the benefits of modeling uncertainty - by analyzing it we can estimate neighborhood diversity and detect the intrinsic latent dimensionality of a graph.
研究动机与目标
- 为带有不确定性和归纳泛化的图中的节点表征学习提供动机。
- 提出一种高斯嵌入以捕捉节点的不确定性。
- 开发一个基于多跳邻域的无监督个性化排序目标。
- 通过深编码器将节点属性映射到嵌入向量,从而实现归纳学习。
提出的方法
- 在低维空间中将每个节点 i 表示为高斯 N(mu_i, Sigma_i)。
- 使用深编码器将节点属性 x_i 映射到 mu_i 和 Sigma_i(对角) 。
- 将不对称的 KL 散度 delta(hi,hj)=DKL(Nj||Ni) 定义为嵌入之间的相似度度量。
- impose a personalized ranking: nodes at 1-hop should be closer than 2-hop, etc., up to K hops.
- Optimize a square-exponential loss that compares energies E_ijk=DKL(N_jk||N_i) and E_ijl=DKL(N_jl||N_i) over valid triplets.
- Adopt node-anchored stochastic sampling to ensure unbiased gradient estimates and scalable training.
- Leverage attribute information to enable inductive generalization to unseen nodes.
- Support plain graphs by using one-hot encodings when attributes are absent.
实验结果
研究问题
- RQ1节点是否可以有效地被嵌入为分布以捕捉带属性图的不确定性?
- RQ2对多跳邻域的个性化排序是否能提升无监督图嵌入?
- RQ3编码器是否能仅利用节点属性实现对未见节点的归纳学习?
- RQ4所提出的方法在连接预测和节点分类任务上与最先端的无监督图嵌入相比如何?
- RQ5嵌入的不确定性提供了关于邻域多样性和内在维度的哪些见解?
主要发现
| 方法 | Cora-ML AUC | Cora-ML AP | Cora AUC | Cora AP | Citeseer AUC | Citeseer AP | DBLP AUC | DBLP AP | PubMed AUC | PubMed AP |
|---|---|---|---|---|---|---|---|---|---|---|
| Logistic Regression | 90.01 | 89.75 | 86.58 | 86.51 | 81.70 | 79.10 | 82.04 | 81.91 | 90.50 | 90.99 |
| node2vec(Grover & Leskovec, 2016) | 76.80 | 75.26 | 79.95 | 78.98 | 83.04 | 83.74 | 95.42 | 95.33 | 95.42 | 95.33 |
| TADW(Yang et al., 2015) | 81.26 | 81.34 | 76.56 | 78.06 | 70.14 | 72.93 | 65.67 | 59.85 | 62.72 | 68.02 |
| TRIDNR(Pan et al., 2016) | 84.51 | 85.69 | 81.61 | 81.08 | 87.23 | 88.87 | 92.01 | 91.62 | NTA | NTA |
| GAE(Kipf & Welling, 2016b) | 96.65 | 96.67 | 97.91 | 98.07 | 92.31 | 93.88 | 95.78 | 96.67 | 96.07 | 96.12 |
| G2G oh | 96.95 | 97.54 | 98.41 | 98.63 | 95.89 | 95.78 | 98.29 | 98.46 | 96.75 | 96.47 |
| G2G | 98.01 | 98.03 | 98.81 | 98.78 | 96.09 | 96.16 | 98.65 | 98.78 | 97.42 | 97.85 |
- Graph2Gauss 在多个人实数据集(Cora-ML, Cora, Citeseer, DBLP, PubMed)上实现了最先进或具有竞争力的连接预测表现。
- 在嵌入维度 L=128 的情况下,该方法在连接预测的平均 AUC 和 AP 指标上优于对手。
- 即使是无属性变体 G2Goh 在某些数据集上也能超越若干基线。
- Graph2Gauss 在 Cora-ML、Citeseer 和 DBLP 上通过无监督预训练展示了强大的节点分类性能。
- 该模型提供了有意义的不确定性度量,与邻域多样性相关并帮助揭示内在潜在维度。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。