QUICK REVIEW

[论文解读] Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation

Shichang Zhang, Yozen Liu|arXiv (Cornell University)|Oct 17, 2021

Advanced Graph Neural Networks参考文献 32被引用 43

一句话总结

GLNN 将 GNN 的知识蒸馏到更大的 MLP 中，使推理不依赖图且更快，同时在多个数据集上实现接近 GNN 的准确性。

ABSTRACT

Graph Neural Networks (GNNs) are popular for graph machine learning and have shown great results on wide node classification tasks. Yet, they are less popular for practical deployments in the industry owing to their scalability challenges incurred by data dependency. Namely, GNN inference depends on neighbor nodes multiple hops away from the target, and fetching them burdens latency-constrained applications. Existing inference acceleration methods like pruning and quantization can speed up GNNs by reducing Multiplication-and-ACcumulation (MAC) operations, but the improvements are limited given the data dependency is not resolved. Conversely, multi-layer perceptrons (MLPs) have no graph dependency and infer much faster than GNNs, even though they are less accurate than GNNs for node classification in general. Motivated by these complementary strengths and weaknesses, we bring GNNs and MLPs together via knowledge distillation (KD). Our work shows that the performance of MLPs can be improved by large margins with GNN KD. We call the distilled MLPs Graph-less Neural Networks (GLNNs) as they have no inference graph dependency. We show that GLNNs with competitive accuracy infer faster than GNNs by 146X-273X and faster than other acceleration methods by 14X-27X. Under a production setting involving both transductive and inductive predictions across 7 datasets, GLNN accuracies improve over stand-alone MLPs by 12.36% on average and match GNNs on 6/7 datasets. Comprehensive analysis shows when and why GLNNs can achieve competitive accuracies to GNNs and suggests GLNN as a handy choice for latency-constrained applications.

研究动机与目标

弥合 GNN 通过图结构获得的上下文与 MLP 的快速、无图推理之间的差距。
证明从 GNN 教师到 MLP 学生的知识蒸馏能够产生具有出色性能的无图模型。
在跨越多样数据集的传导、归纳和生产风格设置下评估 GLNN。
量化相对于 GNNs 和其他推理加速方法的加速，并分析推动 GLNN 成功的因素。

提出的方法

在图上训练一个 GNN 教师（GraphSAGE）以为每个节点生成软目标 z_v。
用组合损失训练学生 MLP：对真实标签的交叉熵损失和对教师软目标的 KL 散度损失（知识蒸馏）。
部署得到的 GLNN，即在推理时不依赖图的 MLP。
在多个数据集上以传导、归纳和生产场景评估 GLNN。
研究模型规模、特征与标签之间的互信息，以及通过 KD 的正则化对性能的影响。
与 MLPs、GNNs 和其他推理加速方法（剪枝、量化、邻居采样）进行比较。

实验结果

研究问题

RQ1从 GNN 到 MLP 的 KD 是否能够产生具有竞争性精度的无图模型？
RQ2GLNN 在传导和归纳设置以及接近生产场景的情况下表现如何？
RQ3哪些因素（模型规模、互信息、归纳偏置）驱动 GLNN 性能提升？
RQ4在时延和准确度方面，GLNNs 与传统推理加速方法相比如何？
RQ5GLNNs 在基于图的任务中的局限性和失败案例有哪些？

主要发现

数据集	SAGE	MLP	GLNN	ΔMLP	ΔGNN
Cora	80.52 ± 1.77	59.22 ± 1.31	80.54 ± 1.35	21.32 (36.00%)	0.02 (0.02%)
Citeseer	70.33 ± 1.97	59.61 ± 2.88	71.77 ± 2.01	12.16 (20.40%)	1.44 (2.05%)
Pubmed	75.39 ± 2.09	67.55 ± 2.31	75.42 ± 2.31	7.87 (11.65%)	0.03 (0.04%)
A-computer	82.97 ± 2.16	67.80 ± 1.06	83.03 ± 1.87	15.23 (22.46%)	0.06 (0.07%)
A-photo	90.90 ± 0.84	78.77 ± 1.74	92.11 ± 1.08	13.34 (16.94%)	1.21 (1.33%)
Arxiv	70.92 ± 0.17	56.05 ± 0.46	63.46 ± 0.45	7.41 (13.24%)	-7.46 (-10.52%)
Products	78.61 ± 0.49	62.47 ± 0.10	68.86 ± 0.46	6.39 (10.23%)	-9.75 (-12.4%)

在多个数据集上，GLNN 能显著超越尺寸相近的 MLP，并达到或接近 GNN 的性能。
GLNN 的推理速度比普通 GNN 快 146×–273×，比其他加速方法快 14×–27×。
在同时具备归纳和传导预测的生产风格设置中，GLNN 比 MLP 平均提升 12.36%，在 6/7 个数据集上达到 GNN 的水平。
增大 MLP 的宽度有助于 GLNN 在较大数据集上缩小与 GNN 的差距，并在标准 MLP 上保持显著增益。
KD 作为正则化器，向 MLP 注入图感知的归纳偏置，当节点特征信息量充足时有助于提高性能。
GLNN 在不同的教师架构下保持竞争力，并对多种设置具有鲁棒性，尽管某些具有挑战性的数据划分（如某些 Arxiv 分布）限制了增益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。