QUICK REVIEW

[Paper Review] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Derek Lim, Felix Höhne|arXiv (Cornell University)|Oct 27, 2021

Advanced Graph Neural Networks59 citations

TL;DR

The paper introduces large-scale non-homophilous graph benchmarks and a simple scalable model, LINKX, which outperforms baselines and remains effective with simple minibatching.

ABSTRACT

Many widely used datasets for graph machine learning tasks have generally been homophilous, where nodes with similar labels connect to each other. Recently, new Graph Neural Networks (GNNs) have been developed that move beyond the homophily regime; however, their evaluation has often been conducted on small graphs with limited application domains. We collect and introduce diverse non-homophilous datasets from a variety of application areas that have up to 384x more nodes and 1398x more edges than prior datasets. We further show that existing scalable graph learning and graph minibatching techniques lead to performance degradation on these non-homophilous datasets, thus highlighting the need for further work on scalable non-homophilous methods. To address these concerns, we introduce LINKX -- a strong simple method that admits straightforward minibatch training and inference. Extensive experimental results with representative simple methods and GNNs across our proposed datasets show that LINKX achieves state-of-the-art performance for learning on non-homophilous graphs. Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale.

Motivation & Objective

Motivate and address the lack of large, diverse non-homophilous graph datasets for evaluating scalable graph learning methods.
Show that existing minibatching and scalable methods underperform in non-homophilous settings on large graphs.
Propose a simple, scalable model LINKX that combines adjacency and feature information to achieve strong performance.
Demonstrate through extensive experiments that LINKX outperforms a wide range of baselines and GNNs on the proposed datasets.

Proposed method

Introduce a diverse set of large non-homophilous datasets spanning multiple application domains and up to 384x more nodes and 1398x more edges than prior work.
Define node features for several datasets and propose a revised non-homophily metric hat{h} to assess deviation from a random graph null model.
Propose LINKX, which separately embeds adjacency A and node features X with MLPs, concatenates the embeddings, applies a linear transform with skip connections, and passes through an MLP to predict labels.
Provide a minibatching-friendly training and inference scheme for LINKX that avoids the graph-specific minibatching complexities of GNNs.
Compare LINKX to a broad spectrum of baselines including MLP, LINK, SGC, C&S, and modern non-homophily-focused GNNs across the new datasets.

Experimental results

Research questions

RQ1How do large-scale non-homophilous graphs differ from traditional homophilous benchmarks in terms of dataset size and performance of existing methods?
RQ2How do current graph minibatching and scalable methods perform when applied to non-homophilous graphs?
RQ3Can a simple model that separates and then fuses adjacency and feature information achieve state-of-the-art performance in non-homophilous settings?
RQ4Do simple i.i.d. node minibatching strategies suffice for scalable learning on large non-homophilous graphs?
RQ5What is the empirical performance of LINKX compared to a wide range of baselines on the proposed benchmarks?

Key findings

The authors assemble large, diverse non-homophilous graphs with substantially more nodes and edges than prior datasets, enabling scalable evaluation.
Graph minibatching techniques (e.g., GraphSAINT) substantially degrade performance in non-homophilous settings, especially on large graphs.
Scalable methods built around homophily assumptions (e.g., SGC, C&S) underperform on non-homophilous data, highlighting the need for methods tailored to non-homophily.
LINKX, a simple model that embeds adjacency and node features separately and then combines them, achieves state-of-the-art performance on the proposed non-homophilous benchmarks."
LINKX supports straightforward i.i.d. node minibatching and scales to large graphs, outperforming many baselines and other non-homophilous methods.
In minibatch experiments on large graphs, LINKX matches or exceeds alternatives, including GNNs and GraphSAINT-based approaches, while remaining computationally efficient.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.