QUICK REVIEW

[論文レビュー] Link Prediction Based on Graph Neural Networks

Muhan Zhang|arXiv (Cornell University)|Feb 27, 2018

Complex Network Analysis Techniques参考文献 46被引用数 282

ひとこと要約

本論文は SEAL を提案し、GNN を用いて局所受容サブグラフからリンク予測のヒューリスティクスを学習し、ガンマ減衰理論で高次ヒューリスティクスを統一的に説明する。

ABSTRACT

Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a `heuristic' that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel $γ$-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the $γ$-decaying theory, we propose a new algorithm to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems.

研究の動機と目的

事前定義された指標を超えた、一般的なグラフ構造ベースのヒューリスティクスをリンク予測のために学習する動機付け。
局所的な包囲サブグラフが高次のヒューリスティクスを近似するのに十分な情報を含むことを示す。
予測精度向上のために、サブグラフ、ノード埋め込み、および属性を統合するGNNベースのフレームワークである SEAL を提案する。
ガンマ減衰ヒューリスティクスによる理論的根拠を提供し、ベースラインに対する実証的優位性を示す。

提案手法

局所構造を捉えるために、ターゲットリンクの周囲に包囲サブグラフを定義する。
ガンマ減衰ヒューリスティック理論を証明し、h-ホップのサブグラフからの近似が誤差を指数関数的に低減させることを示す。
WLNM の全結合ネットワークをグラフニューラルネットワーク（GNN）に置き換え、サブグラフから学習する。
構造ラベル、ノード埋め込み、および明示的属性の3つのノード特徴成分を組み込む（DRNL ラベリングと埋め込みの工夫を通じて）。
ノード埋め込みを生成する際にトレーニングリンクからのリークを防ぐためにネガティブインジェクションを用いる。
サブグラフのグラフレベル表現を用いて、リンクの存在を予測するために正例と負例で SEAL を訓練する。

実験結果

リサーチクエスチョン

RQ1局所的な h-hop 包囲サブグラフは、ハイオーダー効果を含むリンクの存在を予測するのに必要な情報を十分に捉えられるのか。
RQ2ガンマ減衰ヒューリスティクスは、局所サブグラフから一般的な高次リンク予測子を近似するための統一された、証明可能な基盤を提供するのか。
RQ3SEAL は多様なネットワークにおいて、従来のヒューリスティクス、潜在特徴法、および従来の監督型サブグラフ手法よりも優れているのか。
RQ4潜在/明示的特徴とノードラベリングの組み込みが SEAL の性能に与える影響は何か。

主な発見

データ	CN	ジャカード	PA	AA	RA	Katz	PR	SR	ENS	WLK	WLNM	SEAL
USAir	93.80 \u0000b1 1.22	89.79 \u0000b1 1.61	88.84 \u0000b1 1.45	95.06 \u0000b1 1.03	95.77 \u0000b1 0.92	92.88 \u0000b1 1.42	94.67 \u0000b1 1.08	78.89 \u0000b1 2.31	88.96 \u0000b1 1.44	96.63 \u0000b1 0.73	95.95 \u0000b1 1.10	96.62 \u0000b1 0.72
NS	94.42 \u0000b1 0.95	94.43 \u0000b1 0.93	68.65 \u0000b1 2.03	94.45 \u0000b1 0.93	94.45 \u0000b1 0.93	94.85 \u0000b1 1.10	94.89 \u0000b1 1.08	94.79 \u0000b1 1.08	97.64 \u0000b1 0.25	98.57 \u0000b1 0.51	98.61 \u0000b1 0.49	98.85 \u0000b1 0.47
PB	92.04 \u0000b1 0.35	87.41 \u0000b1 0.39	90.14 \u0000b1 0.45	92.36 \u0000b1 0.34	92.46 \u0000b1 0.37	92.92 \u0000b1 0.35	93.54 \u0000b1 0.41	77.08 \u0000b1 0.80	90.15 \u0000b1 0.45	93.83 \u0000b1 0.59	93.49 \u0000b1 0.47	94.72 \u0000b1 0.46
Yeast	89.37 \u0000b1 0.61	89.32 \u0000b1 0.60	82.20 \u0000b1 1.02	89.43 \u0000b1 0.62	89.45 \u0000b1 0.62	92.24 \u0000b1 0.61	92.76 \u0000b1 0.55	91.49 \u0000b1 0.57	82.36 \u0000b1 1.02	95.86 \u0000b1 0.54	95.62 \u0000b1 0.52	97.91 \u0000b1 0.52
C.ele	85.13 \u0000b1 1.61	80.19 \u0000b1 1.64	74.79 \u0000b1 2.04	86.95 \u0000b1 1.40	87.49 \u0000b1 1.41	86.34 \u0000b1 1.89	90.32 \u0000b1 1.49	77.07 \u0000b1 2.00	74.94 \u0000b1 2.04	89.72 \u0000b1 1.67	86.18 \u0000b1 1.72	90.30 \u0000b1 1.35
Power	58.80 \u0000b1 0.88	58.79 \u0000b1 0.88	44.33 \u0000b1 1.02	58.79 \u0000b1 0.88	58.79 \u0000b1 0.88	65.39 \u0000b1 1.59	66.00 \u0000b1 1.59	76.15 \u0000b1 1.06	79.52 \u0000b1 1.78	82.41 \u0000b1 3.43	84.76 \u0000b1 0.98	87.61 \u0000b1 1.57
Router	56.43 \u0000b1 0.52	56.40 \u0000b1 0.52	47.58 \u0000b1 1.47	56.43 \u0000b1 0.51	56.43 \u0000b1 0.51	38.62 \u0000b1 1.35	38.76 \u0000b1 1.39	37.40 \u0000b1 1.27	47.58 \u0000b1 1.48	87.42 \u0000b1 2.08	94.41 \u0000b1 0.88	96.38 \u0000b1 1.45
E.coli	93.71 \u0000b1 0.39	81.31 \u0000b1 0.61	91.82 \u0000b1 0.58	95.36 \u0000b1 0.34	95.95 \u0000b1 0.35	93.50 \u0000b1 0.44	95.57 \u0000b1 0.44	62.49 \u0000b1 1.43	91.89 \u0000b1 0.58	96.94 \u0000b1 0.29	97.21 \u0000b1 0.27	97.64 \u0000b1 0.22

SEAL は複数のデータセットで、事前定義されたヒューリスティクス、WLK、WLNM を一般に上回る。
GNN を用いた SEAL は、学習ベースの手法の中で最先端の結果を達成し、MF、SBM、N2V、LINE、SPC、VGAE などの潜在特徴手法を上回る。
ノード埋め込みを構造的特徴と併用することで、構造のみのベースラインに対して有意な改善をもたらす。
ガンマ減衰理論は、多くの高次ヒューリスティクスが小さな包囲サブグラフから指数関数的に減少する誤差で近似できることを示している。
DRNL ラベリングとネガティブインジェクションの利用は、SEAL の学習と汎化を向上させる。
異なる GNN アーキテクチャと埋め込みを用いても SEAL の性能は高く、頑健性を強調している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。