QUICK REVIEW

[論文レビュー] Benchmarking GNN Models on Molecular Regression Tasks with CKA-Based Representation Analysis

Rajan, Ishaan Gupta|arXiv (Cornell University)|Feb 24, 2026

Computational Drug Discovery Methods被引用数 0

ひとこと要約

この論文は分子回帰タスクで4つのGNNアーキテクチャ（GCN、GAT、GIN、GraphSAGE）をベンチマークし、ECFP4指紋を用いた階層的融合（GNN+FP）を導入し、CKAによるモデル間表現相似性を分析します。

ABSTRACT

Molecules are commonly represented as SMILES strings, which can be readily converted to fixed-size molecular fingerprints. These fingerprints serve as feature vectors to train ML/DL models for molecular property prediction tasks in the field of computational chemistry, drug discovery, biochemistry, and materials science. Recent research has demonstrated that SMILES can be used to construct molecular graphs where atoms are nodes ($V$) and bonds are edges ($E$). These graphs can subsequently be used to train geometric DL models like GNN. GNN learns the inherent structural relationships within a molecule rather than depending on fixed-size fingerprints. Although GNN are powerful aggregators, their efficacy on smaller datasets and inductive biases across different architectures is less studied. In our present study, we performed a systematic benchmarking of four different GNN architectures across a diverse domain of datasets (physical chemistry, biological, and analytical). Additionally, we have also implemented a hierarchical fusion (GNN+FP) framework for target prediction. We observed that the fusion framework consistently outperforms or matches the performance of standalone GNN (RMSE improvement > $7\%$) and baseline models. Further, we investigated the representational similarity using centered kernel alignment (CKA) between GNN and fingerprint embeddings and found that they occupy highly independent latent spaces (CKA $\le0.46$). The cross-architectural CKA score suggests a high convergence between isotopic models like GCN, GraphSAGE and GIN (CKA $\geq0.88$), with GAT learning moderately independent representation (CKA $0.55-0.80$).

研究の動機と目的

物理化学、 biology、 analytics の分野を横断する分子回帰データセットで、4つのGNNアーキテクチャ（GCN、GAT、GIN、GraphSAGE）をベンチマークする。
グラフ埋め込みと固定指紋を融合したハイブリッドGNN+FPモデルが予測精度を向上させるかを評価する。
Centered Kernel Alignment (CKA)を用いてGNN埋め込みと従来の指紋の表現類似性を調査する。
小規模データセットにおけるGNN変種間の表現の収束性または多様性を分析する。

提案手法

SMILESからノード特徴（原子番号、次数、価電子数等）を用いて分子グラフを構築する。
4つのGNN変種（GCN、GAT、GIN、GraphSAGE）と1024ビットECFP4指紋を用いたハイブリッドGNN+FPモデルを評価する。
回帰のためにグローバル平均プーリング付きの単一層GNNと2層MLPを訓練する；GNN+FPではグラフ埋め込みと投影された指紋を結合して回帰を行う。
データはタウモーター標準化と塩・イオン除去で前処理し、データセットあたり1000分子へダウンサンプリングし、訓練/テストを80/20で分割、データセットごとにハイパーパラメータを最適化する。
主指標としてRMSEを用いブートストラップCIを算出する；GNNとFP埋め込みの比較およびクロスアーキテクチャの類似性をCKA（RBFカーネル使用）で評価する。
再現性のためGitHubを通じてオープンソースコードと処理済みデータを提供する。

Figure 1: Violin plots showing the distributions and sample counts of each dataset.

実験結果

リサーチクエスチョン

RQ1データが限られている場合でも、4つの一般的なGNNアーキテクチャ（GCN、GAT、GIN、GraphSAGE）は分子回帰タスクでどの程度性能を発揮するか？
RQ2グラフ埋め込みと従来の指紋を融合したGNN+FPはデータセット全体で予測精度を一貫して向上させるか？
RQ3CKAで測定されるGNN埋込みとECFP4指紋の表現の整合性はどの程度か、融合は補完情報を活用するか？
RQ4等方的なGNNはデータセット間で類似の潜在表現へ収束するのか、GATはどのように異なる表現を学習するか？

主な発見

Model	ESOL	Lipophilicity	RT	B3DB
Linear Regression	4.40±0.56	2.16±0.21	330.65±42.80	1.05±0.16
SVM	1.01±0.12	0.96±0.09	141.03±14.57	0.51±0.06
Random Forest	1.50±0.17	1.11±0.11	114.76±10.81	0.61±0.06
XGBoost	1.14±0.16	1.00±0.11	103.92±14.47	0.54±0.06
GCN	1.39±0.15	1.23±0.10	144.54±11.42	0.65±0.05
GAT	1.48±0.14	1.19±0.11	130.47±11.43	0.65±0.06
GIN	1.28±0.14	1.19±0.10	137.92±12.34	0.62±0.06
GraphSAGE	1.39±0.16	1.19±0.11	144.98±12.19	0.60±0.05
GNN+FP (GCN+FP)	1.07±0.12	1.02±0.10	102.74±14.59	0.59±0.08
GNN+FP (GAT+FP)	1.04±0.13	0.99±0.10	101.40±14.87	0.59±0.08
GNN+FP (GIN+FP)	1.05±0.12	1.04±0.11	103.75±14.90	0.58±0.07
GNN+FP (GraphSAGE+FP)	1.11±0.12	1.02±0.11	103.59±14.68	0.58±0.07

階層的融合（GNN+FP）は単独のGNNやベースラインモデルと比べて一貫して同等以上、データセット全体でRMSEの顕著な改善を示す。
平均的に、GNN+FPはESOLで22.72%、Lipophilicityで15.19%、RTで26.13%、B3DBで7.06%のRMSE改善をGNN単独と比較して達成する。
CKA分析はGNNとFP埋め込みの整合性が中〜低（ESOL: 0.40–0.46、他データセットは0.29–0.32）で、融合によって補完的情報を活用できることを示す。
クロスアーキテクチャCKAは等方的GNN（GCN、GraphSAGE、しばしばGIN）間でほぼ完璧に収束する一方、GATは他と比較してより異なる表現を学習する（CKA 0.55–0.80の範囲）。
指紋ベースのMLベースラインは小規模データセットでGNNを上回る場合があるが、GNNはより大規模データでスケールし、FPベースの手法には解釈性の限界がある。
データ制約（データセットあたり1000分子）は性能に影響を与え、指紋は小データ領域に対して強力な正則化として機能します。

Figure 2: GNN and Hybrid (GNN+FP) Model Architecture.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。