QUICK REVIEW

[论文解读] Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks

Difan Zou, Ziniu Hu|arXiv (Cornell University)|Nov 17, 2019

Advanced Graph Neural Networks被引用 87

一句话总结

LADIES 引入层依赖的重要性采样，以在较低的内存/时间成本下训练深度与大规模 GCN，并在比以往采样方法更好的泛化性能。

ABSTRACT

Graph convolutional networks (GCNs) have recently received wide attentions, due to their successful applications in different graph tasks and different domains. Training GCNs for a large graph, however, is still a challenge. Original full-batch GCN training requires calculating the representation of all the nodes in the graph per GCN layer, which brings in high computation and memory costs. To alleviate this issue, several sampling-based methods have been proposed to train GCNs on a subset of nodes. Among them, the node-wise neighbor-sampling method recursively samples a fixed number of neighbor nodes, and thus its computation cost suffers from exponential growing neighbor size; while the layer-wise importance-sampling method discards the neighbor-dependent constraints, and thus the nodes sampled across layer suffer from sparse connection problem. To deal with the above two problems, we propose a new effective sampling algorithm called LAyer-Dependent ImportancE Sampling (LADIES). Based on the sampled nodes in the upper layer, LADIES selects their neighborhood nodes, constructs a bipartite subgraph and computes the importance probability accordingly. Then, it samples a fixed number of nodes by the calculated probability, and recursively conducts such procedure per layer to construct the whole computation graph. We prove theoretically and experimentally, that our proposed sampling algorithm outperforms the previous sampling methods in terms of both time and memory costs. Furthermore, LADIES is shown to have better generalization accuracy than original full-batch GCN, due to its stochastic nature.

研究动机与目标

在大图上训练深 GCN，克服全批次成本与节点级采样的冗余问题。
开发一个层依赖的采样方案，以维持连接性并降低方差。
证明相对于现有方法在理论上的效率与方差改进。
在基准数据集上展示在运行时间、内存和准确度方面的经验提升。

提出的方法

提出 LADIES，对于每一层，基于上层采样节点及其邻居构建一个二部子图。
使用 p_i^{(l-1)} = ||Q^{(l)} P_{*,i}||_2^2 / ||Q^{(l)} P||_F^2 计算逐层重要性概率以指导采样。
基于计算得到的概率在每一层采样固定数量的节点，并构建一个密集的、归一化的采样邻接矩阵 tilde{P}^{(l-1)} 以传播嵌入。
使用自上而下的层依赖采样以确保连通性并避免感受野指数级增长。
通过逐行归一化 tilde{P}^{(l)} 以稳定训练。
给出内存/时间复杂度与方差的理论分析，并在多个数据集上进行经验验证。

实验结果

研究问题

RQ1层依赖采样如何改善深度 GCN 的计算图连通性与效率？
RQ2与节点级和层级先前方法相比，LADIES 是否提供更低的内存/时间复杂度和更小的方差？
RQ3LADIES 是否在标准图基准数据集上改善或保持预测准确性与泛化能力？
RQ4在非常大规模图上，哪些采样规模足以实现强性能？

主要发现

Dataset	Sample Method	F1-Score(%)	Total Time(s)	Mem(MB)	Batch Time(ms)	Batch Num
Cora (2708)	Full-Batch	76.5±1.4	1.19±0.82	30.72	15.75±0.52	80.8±51.7
Cora (2708)	GraphSage (5)	75.2±1.5	6.77±4.94	471.39	78.42±0.87	65.2±52.1
Cora (2708)	FastGCN (64)	25.1±8.4	0.55±0.65	3.13	9.22±0.20	63.2±71.2
Cora (2708)	FastGCN (512)	78.0±2.1	4.70±1.35	7.33	10.08±0.29	487±147
Cora (2708)	LADIES (64)	77.6±1.4	4.19±1.16	3.13	9.68±0.48	436±118.4
Cora (2708)	LADIES (512)	78.3±1.6	0.72±0.39	7.35	9.77±0.28	75.6±37.0
Citeseer (3327)	Full-Batch	62.3±3.1	0.61±0.70	68.13	15.77±0.58	40.6±22.8
Citeseer (3327)	GraphSage (5)	59.4±0.9	4.51±3.68	595.71	53.14±1.90	57.2±42.1
Citeseer (3327)	FastGCN (64)	19.2±2.7	0.53±0.48	5.89	8.88±0.40	64.0±57.0
Citeseer (3327)	FastGCN (512)	44.6±10.8	4.34±1.73	13.97	10.41±0.51	386±167
Citeseer (3327)	FastGCN (1024)	63.5±1.8	2.24±1.01	23.24	10.54±0.27	223±98.6
Citeseer (3327)	LADIES (64)	65.0±1.4	2.17±0.65	5.89	9.60±0.39	232±66.8
Citeseer (3327)	LADIES (512)	64.3±2.4	0.41±0.22	13.92	10.32±0.23	37.6±11.9
Pubmed (19717)	Full-Batch	71.9±1.9	4.80±1.53	137.93	44.69±0.57	102±33.4
Pubmed (19717)	GraphSage (5)	70.1±1.4	5.53±2.57	453.58	44.73±0.30	74.8±31.7
Pubmed (19717)	FastGCN (64)	38.5±6.9	0.40±0.69	1.92	7.42±0.16	58.8±94.8
Pubmed (19717)	FastGCN (512)	39.3±9.2	0.44±0.61	4.53	10.06±0.41	44.8±55.0
Pubmed (19717)	FastGCN (8192)	74.4±0.8	3.47±1.16	49.41	17.84±0.33	195±56.9
Pubmed (19717)	LADIES (64)	76.8±0.8	2.57±0.72	1.92	9.43±0.47	277±82.2
Pubmed (19717)	LADIES (512)	75.9±1.1	2.27±1.17	4.39	10.43±0.36	245±84.5
Reddit (232965)	Full-Batch	91.6±1.6	474.3±84.4	2370.48	1564±3.41	179±75.5
Reddit (232965)	GraphSage (5)	92.1±1.1	13.12±2.84	1234.63	121.47±0.72	81.5±42.3
Reddit (232965)	FastGCN (64)	27.8±12.6	2.06±1.29	3.75	7.85±0.72	57.4±43.7
Reddit (232965)	FastGCN (512)	17.5±16.7	0.31±0.41	6.91	10.01±0.31	32.1±72.3
Reddit (232965)	FastGCN (8192)	89.5±1.2	5.63±2.12	74.28	16.57±0.58	278±51.2
Reddit (232965)	LADIES (64)	83.5±0.9	5.62±1.58	3.75	9.42±0.48	453±88.2
Reddit (232965)	LADIES (512)	92.8±1.6	6.87±1.17	7.26	10.87±0.63	393±74.4

LADIES 在内存和时间成本方面低于节点级采样方法，并保持同等或更好的准确度。
与 FastGCN 相比，LADIES 由于更小的有效连接节点集而获得更小的方差，并且在大图上受益于较小的采样规模。
在基准数据集（Cora、Citeseer、Pubmed、Reddit）上，LADIES 以更小的样本量（如 64）与更深的结构获得最佳测试准确度。
LADIES 展现出较强的泛化能力，即使使用随机采样，也常常优于全批次 GCN 的验证/测试性能。
LADIES 能在不出现指数级计算增长的情况下，扩展到极大规模图与深层 GCN。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。