QUICK REVIEW

[논문 리뷰] SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

Qitian Wu, Wentao Zhao|arXiv (Cornell University)|2023. 06. 19.

Advanced Graph Neural Networks인용 수 11

한 줄 요약

SGFormer은 간단한 단일 계층, 선형 시간 글로벌 어텐션을 대규모 그래프에 적용하여 경쟁력 있는 노드 표현을 달성하고 웹 규모 그래프(최대 0.1B 노드)까지 확장하며 강력한 효율 향상을 제공합니다.

ABSTRACT

Learning representations on large-sized graphs is a long-standing challenge due to the inter-dependence nature involved in massive data points. Transformers, as an emerging class of foundation encoders for graph-structured data, have shown promising performance on small graphs due to its global attention capable of capturing all-pair influence beyond neighboring nodes. Even so, existing approaches tend to inherit the spirit of Transformers in language and vision tasks, and embrace complicated models by stacking deep multi-head attentions. In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level. This encourages us to rethink the design philosophy for Transformers on large graphs, where the global attention is a computation overhead hindering the scalability. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model that can efficiently propagate information among arbitrary nodes in one layer. SGFormer requires none of positional encodings, feature/graph pre-processing or augmented loss. Empirically, SGFormer successfully scales to the web-scale graph ogbn-papers100M and yields up to 141x inference acceleration over SOTA Transformers on medium-sized graphs. Beyond current results, we believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.

연구 동기 및 목표

대규모 그래프에서 깊은 어텐션의 필요성을 재평가하고 확장 가능한 Transformer 설계를 탐구한다.
모든 쌍 노드 상호 작용을 효율적으로 포착하는 간단하지만 표현력 있는 모델을 개발한다.
수천에서 수십억 노드에 이르는 그래프에서 무거운 전처리 없이 확장 가능한 학습을 달성한다.
최첨단 그래프 Transformer 대비 실험적 성능 및 효율성 이점을 입증한다.

제안 방법

간단화된 그래프 트랜스포머(SGFormer)를 제안하는데, 이는 N 노드에 대해 O(N) 시간으로 작동하는 단일 계층 글로벌 어텐션을 갖는다.
관심 전달을 위한 노드 임베딩 Z^(0)를 얻기 위해 얕은 입력 매핑 f_I를 사용한다.
Q, K, V 투영과 잔차 전파 단계 Z를 사용하여 글로벌 어텐션과 자기 루프 정보를 결합하는 선형 어텐션 메커니즘을 정의한다 (식 2–3).
위치 인코딩, 간선 임베딩, 전처리 및 추가 손실을 피하고; 확률적 근사도도 사용하지 않는다.
선택적으로 그래프 구조를 Z_O = (1-α)Z + α GN(Z^(0), A)를 출력하고 선형 헤드를 통해 예측하여 그래프 구조를 통합한다 (식 4).
이웃 샘플링, 클러스터링 및 과거 임베딩과의 호환성과 함께 미니배치 분할을 통한 대규모 학습을 지원한다.

Figure 1: Illustration of the proposed model SGFormer and its data flow. The input graph data entails node features $\mathbf{X}$ and graph adjacency $\mathbf{A}$ . For large graphs, we need to use mini-batch sampling that randomly partitions the input graph into mini-batches with smaller sizes. Each

실험 결과

연구 질문

RQ1단일 계층, 선형 시간 글로벌 어텐션 트랜스포머가 대규모 그래프에서 다층 트랜스포머에 비해 일치하거나 능가할 수 있는가?
RQ2SGFormer가 중간 규모 및 웹 규모 그래프에서 GNN 및 그래프 트랜스포머와 비교하여 어떻게 성능을 보이는가?
RQ3제한된 지도 학습에서 컴팩트한 아키텍처가 일반화에 미치는 영향은 무엇인가?
RQ4근사 없이 모든 쌍 상호 작용을 포착하기에 선형 시간 어텐션이 충분한가?

주요 결과

단일 계층 어텐션을 사용하여 수천에서 수십억 노드에 이르는 12개 노드 특성 벤치마크에서 경쟁력 있거나 우수한 성능을 보여준다.
중간 규모 그래프에서 SGFormer는 표준 GNN보다 최대 25.9% 앞섰고(예: actor 데이터셋), Graphormer 및 GraphTrans에 비해 매우 경쟁력이 유지되었다.
대형 그래프에서 SGFormer는 다섯 개 데이터셋에서 NodeFormer보다 우수했고, ogbn-papers100M에서 66.0 정확도, 약 3.5시간 학습 및 단일 GPU에서 23.0 GB 메모리를 달성했다.
SGFormer은 웹 규모 그래프(ogbn-papers100M 0.1B 노드)까지 선형 복잡도로 확장되며 학습/추론 시간을 크게 단축한다(예: 중간 크기 그래프에서 SOTA 트랜스포머 대비 추론 141배 빠름).
제곱 어텐션 기반 기준과 비교하여 SGFormer은 상당한 효율 향상을 달성한다(예: Cora에서 Graphormer 대비 학습 38배, 추론 141배 빨램).
더 깊은 다층 어텐션은 일관되게 성능을 향상시키지 못하며 비용이 더 들 수 있어 대형 그래프에서의 단일 계층 설계의 효과를 강조한다.
이론적 분석은 단일 계층 어텐션을 노이즈 제거 최적화 관점과 연결하며, 적절한 설정에서 다층 효과를 일치시킬 수 있음을 보여준다.

Figure 2: Scalability test of training time per epoch and GPU memory usage w.r.t. graph sizes (a.k.a. node numbers). NodeFormer suffers out-of-memory when # nodes reaches more than 30K.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.