QUICK REVIEW

[論文レビュー] Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

Lisa Schneckenreiter, Sohvi Luukkonen|arXiv (Cornell University)|Jan 14, 2026

Computational Drug Discovery Methods被引用数 0

ひとこと要約

ConGLUDeは構造ベースとリガンドベースの薬剤設計を統一するコントラスト幾何学的モデルで、両データ源から学習し、複数のタスクで最先端の性能を達成する。

ABSTRACT

Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for pre-defined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

研究の動機と目的

構造ベースとリガンドベースのデータを統合してスケーラブルな薬剤設計を実現する動機づけ。
事前定義されたポケットを必要とせず、結合ポケット予測を扱う単一のエンドツーエンドモデルを開発する。
リガンド条件付けポケット予測を実現しつつ、仮想スクリーニングとターゲット探索を行う。
構造ベースの複合体と大規模な生物活性データの両方を活用して共同モデルを訓練する。

提案手法

タンパク質全体および暗黙的ポケット表現を生成する幾何学的タンパク質エンコーダー（VN-EGNNベース）を使用する。
リガンドを指紋/記述子の2D MLP射影によって結合タンパク質–ポケット埋め込み空間へマップする高速リガンドエンコーダを組み込む。
構造ベースとリガンドベースのバッチを横断してタンパク質、ポケット、リガンドの表現を整列させるCLIP様三者対比損失を拡張する。
結合構造と生物活性測定から学ぶために、構造ベースデータとリガンドベースデータを交互に訓練する。
事前定義されたポケットへの依存なしに、候補ポケットを予測しリガンド条件付きの類似性でランク付けする。

実験結果

リサーチクエスチョン

RQ1単一のモデルが構造ベースの結合コンフォメーションとリガンドベースの生物活性データの両方から学習できるか。
RQ2タンパク質エンコーダ内でポケット予測を統合することがリガンド条件付けポケット選択とスケーラブルな仮想スクリーニングを可能にするか。
RQ3共同訓練は仮想スクリーニング、ターゲット探索、ポケット予測の性能を専門ベースラインと比較してどうなるか。
RQ4ドッキングに対する速度トレードオフは、競争力のある精度を維持しつつどの程度か。

主な発見

AUROC	BEDROC	EF 0.5%	EF 1%	EF 5%
DrugCLIP	57.17	6.23	8.56	5.51	2.27
DrugHash	54.58	7.14	9.65	6.14	2.42
S2 Drug	58.23	8.69	11.44	7.38	2.97
LigUnity	59.85	11.33	–	6.47	–
HypSeek	62.10	11.96	–	6.81	–
DrugCLIP P2Rank a	49.72	2.96	2.41	2.44	1.36
DrugCLIP VN-EGNN a	52.52	3.56	1.82	2.58	1.59
SPRINT	73.40	12.30	15.90	10.78	5.29
ConGLUDe	64.06 ± 3.25	12.24 ± 2.06	15.87 ± 2.06	11.03 ± 1.81	4.68 ± 0.30

ConGLUDeはLIT-PCBAにおいて競争力のあるゼロショット仮想スクリーニング性能を達成する。
ゼロショット設定でターゲット探索のベースラインを大幅に上回る（表2）。
複数データセットにおいてリガンド条件付きポケット予測で最先端を示す（表4）。
VN-EGNNを用いた結合部位予測性能は、アーキテクチャの適応にもかかわらず維持される（表3）。
ConGLUDeの推論速度は高速コントラスト手法と同等級で、ドッキングベースアプローチよりも大幅に高速（図3）。
表1では、ConGLUDeはトップポケット認識のスコアとして、64.06 ± 3.25 AUROC、12.24 ± 2.06 BEDROC、15.87 ± 2.06 EF 0.5%、11.03 ± 1.81 EF 1%、4.68 ± 0.30 EF 5%を達成し、SPRINTやDrugCLIPなどの強力なベースラインと比較して高い性能を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。