QUICK REVIEW

[論文レビュー] Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks

Difan Zou, Ziniu Hu|arXiv (Cornell University)|Nov 17, 2019

Advanced Graph Neural Networks被引用数 87

ひとこと要約

LADIESは、層依存の重要度サンプリングを導入し、深く大規模なGCNの訓練を、従来のサンプリング手法よりもメモリと時間コストを抑えつつ、一般化性能を向上させる。

ABSTRACT

Graph convolutional networks (GCNs) have recently received wide attentions, due to their successful applications in different graph tasks and different domains. Training GCNs for a large graph, however, is still a challenge. Original full-batch GCN training requires calculating the representation of all the nodes in the graph per GCN layer, which brings in high computation and memory costs. To alleviate this issue, several sampling-based methods have been proposed to train GCNs on a subset of nodes. Among them, the node-wise neighbor-sampling method recursively samples a fixed number of neighbor nodes, and thus its computation cost suffers from exponential growing neighbor size; while the layer-wise importance-sampling method discards the neighbor-dependent constraints, and thus the nodes sampled across layer suffer from sparse connection problem. To deal with the above two problems, we propose a new effective sampling algorithm called LAyer-Dependent ImportancE Sampling (LADIES). Based on the sampled nodes in the upper layer, LADIES selects their neighborhood nodes, constructs a bipartite subgraph and computes the importance probability accordingly. Then, it samples a fixed number of nodes by the calculated probability, and recursively conducts such procedure per layer to construct the whole computation graph. We prove theoretically and experimentally, that our proposed sampling algorithm outperforms the previous sampling methods in terms of both time and memory costs. Furthermore, LADIES is shown to have better generalization accuracy than original full-batch GCN, due to its stochastic nature.

研究の動機と目的

大規模グラフ上での完全バッチコストとノード単位のサンプリングの冗長性を踏まえ、深いGCNの訓練を動機づける。
connectivityを維持し、分散を低減する層依存のサンプリング方式を開発する。
既存手法に対する理論上の効率性と分散削減の利益を証明する。
ベンチマークデータセット上で実行時間、メモリ、精度の実証的な改善を示す。

提案手法

各層について、上層でサンプルされたノードとその隣人から二部グラフを構築するLADIESを提案する。
サンプリングを導くために層ごとの重要度確率を計算する：p_i^{(l-1)} = ||Q^{(l)} P_{*,i}||_2^2 / ||Q^{(l)} P||_F^2。
計算された確率に基づいて各層で固定数のノードをサンプリングし、埋め込みを伝搬させる密で正規化されたサンプル付き隣接行列 tilde{P}^{(l-1)} を構築する。
上流-下流の層依存サンプリングを用いて連結性を確保し、受容野の指数的な成長を回避する。
訓練を安定化させるために tilde{P}^{(l)}を行和和で正規化する。
メモリ・時間計算量と分散の理論的分析、および複数データセットでの実証的検証を提供する。

実験結果

リサーチクエスチョン

RQ1深層GCNにおける計算グラフの連結性と効率を層依存サンプリングでどう改善できるか？
RQ2LADIESはノード単位および層単位の既存手法と比べて、低いメモリ/時間複雑性と分散削減を提供できるか？
RQ3標準的なグラフベンチマークで予測精度と一般化を向上させるか、あるいは維持できるか？
RQ4非常に大規模なグラフで強力な性能を得るには、どのようなサンプルサイズが十分か？

主な発見

Dataset	Sample Method	F1-Score(%)	Total Time(s)	Mem(MB)	Batch Time(ms)	Batch Num
Cora (2708)	Full-Batch	76.5±1.4	1.19±0.82	30.72	15.75±0.52	80.8±51.7
Cora (2708)	GraphSage (5)	75.2±1.5	6.77±4.94	471.39	78.42±0.87	65.2±52.1
Cora (2708)	FastGCN (64)	25.1±8.4	0.55±0.65	3.13	9.22±0.20	63.2±71.2
Cora (2708)	FastGCN (512)	78.0±2.1	4.70±1.35	7.33	10.08±0.29	487±147
Cora (2708)	LADIES (64)	77.6±1.4	4.19±1.16	3.13	9.68±0.48	436±118.4
Cora (2708)	LADIES (512)	78.3±1.6	0.72±0.39	7.35	9.77±0.28	75.6±37.0
Citeseer (3327)	Full-Batch	62.3±3.1	0.61±0.70	68.13	15.77±0.58	40.6±22.8
Citeseer (3327)	GraphSage (5)	59.4±0.9	4.51±3.68	595.71	53.14±1.90	57.2±42.1
Citeseer (3327)	FastGCN (64)	19.2±2.7	0.53±0.48	5.89	8.88±0.40	64.0±57.0
Citeseer (3327)	FastGCN (512)	44.6±10.8	4.34±1.73	13.97	10.41±0.51	386±167
Citeseer (3327)	FastGCN (1024)	63.5±1.8	2.24±1.01	23.24	10.54±0.27	223±98.6
Citeseer (3327)	LADIES (64)	65.0±1.4	2.17±0.65	5.89	9.60±0.39	232±66.8
Citeseer (3327)	LADIES (512)	64.3±2.4	0.41±0.22	13.92	10.32±0.23	37.6±11.9
Pubmed (19717)	Full-Batch	71.9±1.9	4.80±1.53	137.93	44.69±0.57	102±33.4
Pubmed (19717)	GraphSage (5)	70.1±1.4	5.53±2.57	453.58	44.73±0.30	74.8±31.7
Pubmed (19717)	FastGCN (64)	38.5±6.9	0.40±0.69	1.92	7.42±0.16	58.8±94.8
Pubmed (19717)	FastGCN (512)	39.3±9.2	0.44±0.61	4.53	10.06±0.41	44.8±55.0
Pubmed (19717)	FastGCN (8192)	74.4±0.8	3.47±1.16	49.41	17.84±0.33	195±56.9
Pubmed (19717)	LADIES (64)	76.8±0.8	2.57±0.72	1.92	9.43±0.47	277±82.2
Pubmed (19717)	LADIES (512)	75.9±1.1	2.27±1.17	4.39	10.43±0.36	245±84.5
Reddit (232965)	Full-Batch	91.6±1.6	474.3±84.4	2370.48	1564±3.41	179±75.5
Reddit (232965)	GraphSage (5)	92.1±1.1	13.12±2.84	1234.63	121.47±0.72	81.5±42.3
Reddit (232965)	FastGCN (64)	27.8±12.6	2.06±1.29	3.75	7.85±0.72	57.4±43.7
Reddit (232965)	FastGCN (512)	17.5±16.7	0.31±0.41	6.91	10.01±0.31	32.1±72.3
Reddit (232965)	FastGCN (8192)	89.5±1.2	5.63±2.12	74.28	16.57±0.58	278±51.2
Reddit (232965)	LADIES (64)	83.5±0.9	5.62±1.58	3.75	9.42±0.48	453±88.2
Reddit (232965)	LADIES (512)	92.8±1.6	6.87±1.17	7.26	10.87±0.63	393±74.4

LADIESはノード単位サンプリング手法よりも低いメモリと時間コストを実現し、精度は同等あるいは上回る。
FastGCNと比較すると、より小さな実質的連結ノード集合を用いるため分散が厳密に改善され、大規模グラフでは小さなサンプルサイズの恩恵を受ける。
ベンチマーク（Cora, Citeseer, Pubmed, Reddit）では、小さなサンプル（例: 64）かつ深いアーキテクチャで最良のテスト精度を達成。
LADIESはしばしば全バッチGCNを上回る検証/テストでの一般化性能を示し、確率的サンプリングを使用しても堅牢な性能を維持。
LADIESは計算量の指数的増加を伴わず、非常に大規模グラフと深いGCNへスケールする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。