Skip to main content
QUICK REVIEW

[论文解读] MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

Sami Abu-El-Haija, Bryan Perozzi|arXiv (Cornell University)|Apr 30, 2019
Advanced Graph Neural Networks参考文献 23被引用 269
一句话总结

MixHop 通过在每一层学习多个人邻接幂实现更高阶的邻域混合,产生类似 delta 运算符的表示,并通过稀疏正则化揭示数据集特定的架构;在基准图上实现了最先进的节点分类效果,无需额外的内存/复杂度。

ABSTRACT

Existing popular methods for semi-supervised learning with Graph Neural Networks (such as the Graph Convolutional Network) provably cannot learn a general class of neighborhood mixing relationships. To address this weakness, we propose a new model, MixHop, that can learn these relationships, including difference operators, by repeatedly mixing feature representations of neighbors at various distances. Mixhop requires no additional memory or computational complexity, and outperforms on challenging baselines. In addition, we propose sparsity regularization that allows us to visualize how the network prioritizes neighborhood information across different graph datasets. Our analysis of the learned architectures reveals that neighborhood mixing varies per datasets.

研究动机与目标

  • Motivate the limitation of existing GCNs in capturing general neighborhood mixing relationships.
  • Introduce MixHop to learn higher-order (multi-distance) feature mixing without extra memory or computational burden.
  • Show that MixHop can represent delta operators and general neighborhood mixing.
  • Demonstrate improved node classification performance on benchmark graphs and visualize learned architectures via sparsity regularization.

提出的方法

  • Define MixHop layer: H^{(i+1)} = ||_{j in P} sigma( A_hat^{j} H^{(i)} W^{(i)}_{j}) where A_hat is the normalized adjacency with self-loops.
  • Show MixHop subsumes vanilla GCN when P = {1}.
  • Prove MixHop can represent two-hop Delta Operators, unlike vanilla GCN.
  • Generalize to layer-wise neighborhood mixing: f(sum_j alpha_j sigma(A_hat^{j} X)).
  • Employ L2 group Lasso to automatically learn compact architectures by pruning entire columns of W^{(i)}_{j}.
  • Optionally design an outcome-choosing output layer to emphasize subsets of features.

实验结果

研究问题

  • RQ1RQ1: Can vanilla GCNs represent delta-operator style higher-order neighborhood differences (e.g., two-hop Delta Operators)?
  • RQ2RQ2: Does MixHop, via learning multiple adjacency powers per layer, learn richer neighborhood mixing including delta operators?
  • RQ3RQ3: Do higher-order neighborhood mixing architectures improve semi-supervised node classification on real graphs?
  • RQ4RQ4: Do optimal MixHop architectures differ across datasets, and can sparsity regularization reveal dataset-specific structures?

主要发现

  • MixHop outperforms baselines on Citeseer, Cora, and Pubmed in semi-supervised node classification.
  • The learned architectures differ by dataset, with selective capacity allocated to different adjacency powers.
  • Synthetic experiments show MixHop tends to learn delta operators more at low graph homophily.
  • The approach achieves state-of-the-art or competitive results vs vanilla GCN, Chebyshev, and GAT baselines.
  • Sparsity regularization yields interpretable, dataset-specific network architectures without increasing memory footprint.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。