Skip to main content
QUICK REVIEW

[论文解读] Towards Deeper Graph Neural Networks with Differentiable Group Normalization

Kaixiong Zhou, Xiao Huang|arXiv (Cornell University)|Jun 12, 2020
Advanced Graph Neural Networks参考文献 25被引用 81
一句话总结

本文提出 differentiable group normalization (DGN) 以缓解 Graph Neural Networks 的过平滑问题,使架构更深并通过将节点聚成组并独立归一化来提升节点分类性能。它还提出了两种过平滑度量——group distance ratio 和 instance information gain。

ABSTRACT

Graph neural networks (GNNs), which learn the representation of a node by aggregating its neighbors, have become an effective computational tool in downstream applications. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. It is because the stacked aggregators would make node representations converge to indistinguishable vectors. Several attempts have been made to tackle the issue by bringing linked node pairs close and unlinked pairs distinct. However, they often ignore the intrinsic community structures and would result in sub-optimal performance. The representations of nodes within the same community/class need be similar to facilitate the classification, while different classes are expected to be separated in embedding space. To bridge the gap, we introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN). It normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue. Experiments on real-world datasets demonstrate that DGN makes GNN models more robust to over-smoothing and achieves better performance with deeper GNNs.

研究动机与目标

  • 量化 GNN 的过平滑现象,从群体和实例角度使用新的度量标准。
  • 提出可微分的 group normalization 技术以减少过平滑。
  • 证明 DGN 使 GNN 更深且在基准数据集上取得更好性能。
  • 在存在缺失节点特征的情景下展示 DGN 的鲁棒性。

提出的方法

  • 定义两种度量来衡量过平滑:Group Distance Ratio 和 Instance Information Gain。
  • 引入 differentiable group normalization (DGN),在 GNN 层之间将节点软聚类成组并对每个组独立归一化。
  • 通过一个可微分的 softmax 基聚类来计算分组:S^(k) = softmax(H^(k) U^(k)).
  • 用各自的 running mean/variance 和仿射参数对每个组进行归一化,然后与原始嵌入 H^(k) 结合,即 H^(k) + λ sum_i tilde{H}^(k)_i。
  • 端到端训练以同时优化监督损失和在 group normalization 中隐含的正则化效应。
  • 证明 DGN 在解耦分布以缓解过平滑的同时保持输入特征。

实验结果

研究问题

  • RQ1如何在超越成对节点距离的范围内,精确测量 GNN 的过平滑?
  • RQ2一种基于组的归一化策略能否在不牺牲有用自特征的前提下缓解过平滑?
  • RQ3在 DGN 的支持下,深层 GNN 是否在标准基准和特征缺失场景下提升性能?

主要发现

  • DGN 显著缓解过平滑,在多个数据集和深度设定下优于 none、batch、和 pair 归一化。
  • 在有 DGN 的情况下,较深的 GNN 比较浅的模型获得更高的准确率,例如 SGC 在 Cora 上当 K = 20 时最高准确率为 79.7%。
  • 在缺失特征情景下,DGN 相对于基线的增益显著:平均提升为 37.8%(相对于 NN)、7.1%(相对于 BN)和 12.8%(相对于 PN)。
  • DGN 使更深的体系能够有效利用多跳邻域信息,通常使用更大的最优层数(在某些设置中可达 30 层)。
  • 该方法保留自保持成分 H^(k),以避免过度归一化,同时分组归一化在各组之间解耦分布。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。