QUICK REVIEW

[論文レビュー] Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination

Yizhen Zheng, Shirui Pan|arXiv (Cornell University)|Jun 3, 2022

Advanced Graph Neural Networks被引用数 51

ひとこと要約

論文は Graph Group Discrimination (GD) と Siamese モデル GGD を導入し、大規模グラフで自己教師付きグラフ表現の最先端を実現するとともに、トレーニング速度とメモリ効率を大幅に向上させる。

ABSTRACT

Graph contrastive learning (GCL) alleviates the heavy reliance on label information for graph representation learning (GRL) via self-supervised learning schemes. The core idea is to learn by maximising mutual information for similar instances, which requires similarity computation between two node instances. However, GCL is inefficient in both time and memory consumption. In addition, GCL normally requires a large number of training epochs to be well-trained on large-scale datasets. Inspired by an observation of a technical defect (i.e., inappropriate usage of Sigmoid function) commonly used in two representative GCL works, DGI and MVGRL, we revisit GCL and introduce a new learning paradigm for self-supervised graph representation learning, namely, Group Discrimination (GD), and propose a novel GD-based method called Graph Group Discrimination (GGD). Instead of similarity computation, GGD directly discriminates two groups of node samples with a very simple binary cross-entropy loss. In addition, GGD requires much fewer training epochs to obtain competitive performance compared with GCL methods on large-scale datasets. These two advantages endow GGD with very efficient property. Extensive experiments show that GGD outperforms state-of-the-art self-supervised methods on eight datasets. In particular, GGD can be trained in 0.18 seconds (6.44 seconds including data preprocessing) on ogbn-arxiv, which is orders of magnitude (10,000+) faster than GCL baselines while consuming much less memory. Trained with 9 hours on ogbn-papers100M with billion edges, GGD outperforms its GCL counterparts in both accuracy and efficiency.

研究の動機と目的

既存のグラフコントラスト学習（GCL）アプローチを再評価し、非効率性を特定する。
MIベースの対比損失への代替学習パラダイムとして Group Discrimination (GD) を提案する。
GD を用いた Siamese ネットワーク構造の高速・スケーラブルな GCL モデルである GGD を開発する。
ogbn-papers100M を含む8データセットで最先端の性能と優れた効率を実証する。

提案手法

Group Discrimination (GD) を、元のグラフ/破損グラフからの正例を含むノードサンプルの2つのグループを二値交差エントロピー損失で識別する手法として定義する。
Siamese GNN エンコーダとプロジェクターを備えた Graph Group Discrimination (GGD) を導入する。
任意の拡張と破損を用いて正例および負例のグループを生成する。
単純なアグリゲーション拡張によって局所埋め込みとグローバル情報成分を組み合わせて埋め込みを推定する（H = H_theta + H_theta^{global}).
明示的なノードペア類似度計算を削除することによる大幅な学習時間およびメモリの利点を示す。

実験結果

リサーチクエスチョン

RQ1グループディスクリミネーション（GD）は、グラフ対比学習における相互情報量ベースの目的に代わる効果的な代替手段となり得るか？
RQ2GDは大規模グラフでより速いトレーニング、より良いスケーラビリティ、メモリ使用量の削減を実現しつつ、精度を維持または向上させるか？
RQ3GDベースの学習プロセスにおける拡張・破損戦略の影響は何か？
RQ4小規模/中規模/超大規模グラフデータセットに対して、GGDは最先端のGCL手法とどのように性能を競うか？

主な発見

手法	前	Tr	エポ	Total(E)	Imp(E)	Total(T)	Imp(T)	精度
GBT(256)	5.52	6.47	300	1,946.52	-	1,941.00	-	70.1
GGD (256)	6.26	0.18	1	6.44	302.25×	0.18	10,783.33×	70.3
GGD (1,500)	6.26	0.95	1	7.21	269.96×	0.95	2,043.16×	71.6

GGDは ogbn-arxiv、ogbn-products、ogbn-papers100M を含む8データセットで最先端または競合的な精度を達成する。
GGDはベースラインより圧倒的に速く、メモリ効率も高い。例としてTable 1の最強のGCLベースラインと比較してogbn-arxivでエンドツーエンドの速度が最大10,783×向上。
ogbn-arxivでは、GGD（1エポック）はバリデーション/テスト精度71.6–71.7%を達成し、フルバッチベースラインよりはるかに少ないメモリと時間を使用する。
GGDは_neighbourhood sampling_を用いて非常に大規模なグラフ（ogbn-papers100M）にスケールし、エポックあたりのトレーニング時間を大幅に削減しつつ競争力のある性能を達成する。
GGDのエポックあたりのトレーニング時間は、いくつかのデータセットでベースラインより1桁小さい（例：0.010–0.021s 対 0.059–0.158s）。
MIベースの目的の代わりに BCEベースの Group Discrimination ロスを用いたアブレーション風の比較では、性能差はわずかで、メモリと時間の消費が大幅に削減される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。