QUICK REVIEW

[논문 리뷰] A Generalization of Transformer Networks to Graphs

Vijay Prakash Dwivedi, Xavier Bresson|arXiv (Cornell University)|2020. 12. 17.

Advanced Graph Neural Networks참고 문헌 37인용 수 321

한 줄 요약

이 논문은 그래프 희소성, Laplacian 고유벡터 위치 인코딩, 배치 정규화, 선택적 엣지 특징 처리 등을 도입하여 Transformer 아키텍처를 임의의 그래프로 일반화하고 그래프 벤치마크에서 경쟁력 있는 성능을 보여줍니다.

ABSTRACT

We propose a generalization of transformer neural network architecture for arbitrary graphs. The original transformer was designed for Natural Language Processing (NLP), which operates on fully connected graphs representing all connections between the words in a sequence. Such architecture does not leverage the graph connectivity inductive bias, and can perform poorly when the graph topology is important and has not been encoded into the node features. We introduce a graph transformer with four new properties compared to the standard model. First, the attention mechanism is a function of the neighborhood connectivity for each node in the graph. Second, the positional encoding is represented by the Laplacian eigenvectors, which naturally generalize the sinusoidal positional encodings often used in NLP. Third, the layer normalization is replaced by a batch normalization layer, which provides faster training and better generalization performance. Finally, the architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs). Numerical experiments on a graph benchmark demonstrate the performance of the proposed graph transformer architecture. This work closes the gap between the original transformer, which was designed for the limited case of line graphs, and graph neural networks, that can work with arbitrary graphs. As our architecture is simple and generic, we believe it can be used as a black box for future applications that wish to consider transformer and graphs.

연구 동기 및 목표

임의의 그래프에서 그래프 구조와 귀납 편향을 활용하기 위해 Transformer를 적응시키는 것을 동기화한다.
전체 연결성 대신 로컬 그래프 이웃에 주의를 기울이는 Graph Transformer 레이어를 도입한다.
그래프에서 노드 위치를 포착하기 위해 Laplacian 고유벡터 기반 위치 인코딩을 도입한다.
쌍 간 엣지 정보를 활용하기 위해 엣지 특징을 지원하는 아키텍처 변형을 제공한다.
표준 그래프 벤치마크에서 GNN 베이스라인과 비교해 경쟁력 있는 성능을 입증한다.

제안 방법

노드 및 엣지 특징을 선형 투영으로 공통 은닉 차원에 임베딩한다.
입력 노드 특징에 Laplacian 고유벡터 기반 위치 인코딩을 추가한다.
각 헤드가 이웃에 소프트맥스를 적용하여 로컬 이웃에 주의를 기울이는 다중-헤드 어텐션을 계산한다.
FFN 주위에 잔차 연결 및 정규화(BatchNorm 또는 LayerNorm)를 포함한다.
전용 FFN으로 노드 및 엣지 표현을 함께 업데이트하는 Graph Transformer 변형을 제공한다.
희소 그래프와 전체 그래프 설정 모두에서 ZINC, PATTERN, CLUSTER 데이터셋으로 평가한다.

실험 결과

연구 질문

RQ1Transformer와 유사한 주의 메커니즘을 그래프 이웃으로 효과적으로 로컬화하여 희소성을 활용할 수 있는가?
RQ2Lapla cian 고유벡터 위치 인코딩이 다른 위치 인코딩보다 그래프 작업의 성능을 향상시키는가?
RQ3그래프 트랜스포머에서 LayerNorm을 배치 정규화(BatchNorm)로 교체하는 것이 학습과 일반화에 개선을 가져오는가?
RQ4엣지 특징을 Graph Transformer에 도입하는 것이 풍부한 엣지 정보를 가진 데이터셋에서 성능을 향상시키는가?

주요 결과

Laplacian PE와 BatchNorm을 갖춘 Graph Transformer가 세 가지 데이터셋 모두에서 기본 등방성 및 이방성 GNN보다 성능이 우수하다.
희소 그래프 구성은 전체 그래프보다 더 나은 성능을 보이며 희소성 귀납 바이어스를 검증한다.
엣지 특징을 가진 Graph Transformer가 ZINC에서 최고의 GNN(GatedGCN)과 거의 비슷한 성능에 근접한다.
Laplacian PE 기반 인코딩은 이 작업에서 Graph-BERT의 WL-PE 및 친밀도 기반 인코딩보다 우수하다.
LayerNorm 대신 BatchNorm을 사용하는 것이 일반적으로 학습 효율성과 일반화 성능을 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.