QUICK REVIEW

[Paper Review] DeeperGCN: All You Need to Train Deeper GCNs

Guohao Li, Chenxin Xiong|arXiv (Cornell University)|Jun 13, 2020

AI in cancer detection49 references257 citations

TL;DR

This paper introduces DeeperGCN, a framework of generalized aggregation, pre-activation residuals, and a MsgNorm layer to reliably train very deep GCNs and achieve state-of-the-art results on large-scale OGB benchmarks.

ABSTRACT

Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs. Unlike Convolutional Neural Networks (CNNs), which are able to take advantage of stacking very deep layers, GCNs suffer from vanishing gradient, over-smoothing and over-fitting issues when going deeper. These challenges limit the representation power of GCNs on large-scale graphs. This paper proposes DeeperGCN that is capable of successfully and reliably training very deep GCNs. We define differentiable generalized aggregation functions to unify different message aggregation operations (e.g. mean, max). We also propose a novel normalization layer namely MsgNorm and a pre-activation version of residual connections for GCNs. Extensive experiments on Open Graph Benchmark (OGB) show DeeperGCN significantly boosts performance over the state-of-the-art on the large scale graph learning tasks of node property prediction and graph property prediction. Please visit https://www.deepgcns.org for more information.

Motivation & Objective

Motivate and address the challenges of training very deep GCNs on large-scale graphs (vanishing gradients, over-smoothing, over-fitting).
Introduce a differentiable generalized aggregation framework to unify and learn aggregation functions for GCNs.
Develop architectural and normalization innovations (pre-activation residuals, MsgNorm) to stabilize deep GCN training.
Evaluate on Open Graph Benchmark (OGB) datasets to demonstrate empirical gains over prior SOTA methods.

Proposed method

Define a generalized, differentiable aggregation function that covers mean, max, and learnable interpolations (SoftMax_Agg and PowerMean_Agg).
Propose GENet: a GCN variant using generalized aggregators with positive message features to enable the generalized mean–max family.
Introduce pre-activation residual connections for GCNs (Normalization -> ReLU -> GraphConv -> Addition) to improve training stability.
Develop MsgNorm: a message normalization layer that scales and normalizes aggregated messages before the vertex update.
Provide DyResGEN variants that learn aggregation parameters (beta, p) dynamically per layer and per training step.
Extensively evaluate on OGB node and graph prediction tasks, comparing with plain GCNs, ResGCN, and SOTA baselines.

Experimental results

Research questions

RQ1Can a generalized, differentiable aggregation function improve the learnability and performance of deep GCNs on large-scale graphs?
RQ2What is the impact of updated skip connections and a novel normalization (MsgNorm) on deep GCN training stability and accuracy?
RQ3Do dynamically learned aggregation parameters (beta, p) further improve performance across node and graph prediction tasks?
RQ4How close can DeeperGCN approach or surpass state-of-the-art results on Open Graph Benchmark datasets across different tasks?

Key findings

Generalized mean–max aggregations (SoftMax_Agg and PowerMean_Agg) unify and extend common aggregators and can be learned end-to-end.
Pre-activation residuals improve performance and enable deeper models (up to 112 layers) across aggregators.
MsgNorm significantly boosts deep GCN performance, especially with weaker aggregators.
Dynamic learning of aggregation parameters (DyResGEN) yields further gains over fixed aggregators, achieving new SOTA on several tasks.
On OGBN-proteins, OGBN-arxiv, OGBG-ppa, and OGBG-molhiv, DeeperGCN variants outperform previous SOTA by notable margins.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.