Skip to main content
QUICK REVIEW

[論文レビュー] Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation

Shichang Zhang, Yozen Liu|arXiv (Cornell University)|Oct 17, 2021
Advanced Graph Neural Networks参考文献 32被引用数 43
ひとこと要約

GLNN は GNN から知識を蒸留し、より大きな MLP に適用して推論をグラフなしにし高速化する一方で、複数のデータセットで GNN に近い精度を達成する。

ABSTRACT

Graph Neural Networks (GNNs) are popular for graph machine learning and have shown great results on wide node classification tasks. Yet, they are less popular for practical deployments in the industry owing to their scalability challenges incurred by data dependency. Namely, GNN inference depends on neighbor nodes multiple hops away from the target, and fetching them burdens latency-constrained applications. Existing inference acceleration methods like pruning and quantization can speed up GNNs by reducing Multiplication-and-ACcumulation (MAC) operations, but the improvements are limited given the data dependency is not resolved. Conversely, multi-layer perceptrons (MLPs) have no graph dependency and infer much faster than GNNs, even though they are less accurate than GNNs for node classification in general. Motivated by these complementary strengths and weaknesses, we bring GNNs and MLPs together via knowledge distillation (KD). Our work shows that the performance of MLPs can be improved by large margins with GNN KD. We call the distilled MLPs Graph-less Neural Networks (GLNNs) as they have no inference graph dependency. We show that GLNNs with competitive accuracy infer faster than GNNs by 146X-273X and faster than other acceleration methods by 14X-27X. Under a production setting involving both transductive and inductive predictions across 7 datasets, GLNN accuracies improve over stand-alone MLPs by 12.36% on average and match GNNs on 6/7 datasets. Comprehensive analysis shows when and why GLNNs can achieve competitive accuracies to GNNs and suggests GLNN as a handy choice for latency-constrained applications.

研究の動機と目的

  • GNN の文脈とグラフ構造を通じた推論と、MLP の高速でグラフフリーな推論とのギャップを橋渡しする。
  • GNN の教師から MLP の生徒へ知識蒸留を行うことで、グラフレスなモデルが強力な性能を発揮し得ることを示す。
  • 多様なデータセットに対して、トランスダクティブ、帰納的、およびプロダクション風設定の下で GLNN を評価する。
  • GNNs および他の推論加速手法に対する速度向上を定量化し、GLNN の成功要因を分析する。

提案手法

  • グラフ上で GNN の教師(GraphSAGE)を訓練し、各ノードに対してソフトターゲット z_v を生成する。
  • 真のラベルに対するクロスエントロピーと教師のソフトターゲットへの KLダイバージェンスを組み合わせた損失で生徒の MLP を訓練する(知識蒸留)。
  • 推論時にグラフ依存性を持たない MLP である得られた GLNN をデプロイする。
  • 複数のデータセットに対して、トランスダクティブ、帰納的、およびプロダクション設定で GLNN を評価する。
  • モデルサイズ、特徴量とラベル間の相互情報、KD による正則化が性能に与える影響を調査する。
  • MLP、GNN、および他の推論加速手法(プルーニング、量子化、近傍サンプリング)と比較する。

実験結果

リサーチクエスチョン

  • RQ1GKD from GNNs to MLPs produce graph-less models with competitive accuracy?
  • RQ2How do GLNNs perform in both transductive and inductive settings and in production-like scenarios?
  • RQ3What factors (model size, mutual information, inductive bias) drive GLNN performance gains?
  • RQ4How do GLNNs compare to traditional inference-acceleration methods in latency and accuracy?
  • RQ5What are the limitations and failure cases of GLNNs in graph-based tasks?

主な発見

DatasetSAGEMLPGLNNΔMLPΔGNN
Cora80.52 ± 1.7759.22 ± 1.3180.54 ± 1.3521.32 (36.00%)0.02 (0.02%)
Citeseer70.33 ± 1.9759.61 ± 2.8871.77 ± 2.0112.16 (20.40%)1.44 (2.05%)
Pubmed75.39 ± 2.0967.55 ± 2.3175.42 ± 2.317.87 (11.65%)0.03 (0.04%)
A-computer82.97 ± 2.1667.80 ± 1.0683.03 ± 1.8715.23 (22.46%)0.06 (0.07%)
A-photo90.90 ± 0.8478.77 ± 1.7492.11 ± 1.0813.34 (16.94%)1.21 (1.33%)
Arxiv70.92 ± 0.1756.05 ± 0.4663.46 ± 0.457.41 (13.24%)-7.46 (-10.52%)
Products78.61 ± 0.4962.47 ± 0.1068.86 ± 0.466.39 (10.23%)-9.75 (-12.4%)
  • GLNNs substantially outperform similarly sized MLPs and match or approach GNN performance on several datasets.
  • GLNNs achieve 146×–273× faster inference than vanilla GNNs and 14×–27× faster than other acceleration methods.
  • In production-style settings with both inductive and transductive predictions, GLNNs improve over MLPs by 12.36% on average and match GNNs on 6/7 datasets.
  • Enlarging the MLP (width) helps GLNNs close the gap to GNNs on larger datasets, preserving large gains over standard MLPs.
  • KD acts as a regularizer and injects graph-aware inductive bias into MLPs, aiding their performance when node features are informative.
  • GLNNs maintain competitive performance across different teacher architectures and are robust to many settings, though some challenging splits (e.g., certain Arxiv distributions) limit gains.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。