QUICK REVIEW

[论文解读] VN-EGNN: E(3)-Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification

Florian Sestak, Lisa Schneckenreiter|arXiv (Cornell University)|Apr 10, 2024

Machine Learning in Bioinformatics被引用 5

一句话总结

VN-EGNN 在 EGNN 的基础上加入虚拟节点，以学习并预测结合位点中心，在 COACH420、HOLO4K 和 PDBbind2020 数据集上实现了最先进的 DCC/DCA。

ABSTRACT

Being able to identify regions within or around proteins, to which ligands can potentially bind, is an essential step to develop new drugs. Binding site identification methods can now profit from the availability of large amounts of 3D structures in protein structure databases or from AlphaFold predictions. Current binding site identification methods heavily rely on graph neural networks (GNNs), usually designed to output E(3)-equivariant predictions. Such methods turned out to be very beneficial for physics-related tasks like binding energy or motion trajectory prediction. However, the performance of GNNs at binding site identification is still limited potentially due to the lack of dedicated nodes that model hidden geometric entities, such as binding pockets. In this work, we extend E(n)-Equivariant Graph Neural Networks (EGNNs) by adding virtual nodes and applying an extended message passing scheme. The virtual nodes in these graphs are dedicated quantities to learn representations of binding sites, which leads to improved predictive performance. In our experiments, we show that our proposed method VN-EGNN sets a new state-of-the-art at locating binding site centers on COACH420, HOLO4K and PDBbind2020.

研究动机与目标

通过使用 3D-等变图和虚拟节点来建模如结合口袋等隐藏几何实体，以提升结合位点识别的能力。
通过在 EGNNs 中扩展多个虚拟节点来学习中心表示并缓解信息挤压，开发 VN-EGNN。
在多样化的结合位点基准数据集（COACH420、HOLO4K、PDBbind2020）及基线方法上评估 VN-EGNN。
提供对等变性、表达能力以及虚拟节点对学习结合位点中心影响的分析。

提出的方法

将 E(3)-等变 GNNs（EGNNs）扩展为连接到所有物理节点的 K 个虚拟节点。
使用三阶段消息传递方案，在每层中对物理节点特征和坐标进行两次更新，而虚拟节点在每层更新一次。
读出最终的虚拟节点坐标以预测结合位点中心并执行节点级别的结合口袋分割。
使用多任务目标进行训练，将结合位点中心定位损失与分割损失（Dice 或交叉熵）结合起来。
加入自信度模块，为每个预测的中心分配并学习置信分数。
在球面斐波那契网格上初始化虚拟节点；对每个样本随机化初始对齐，以促进近似的 E(3) 不变性。

Figure 1: Overview of binding site identification methods. Top Left : Traditional methods, based on segmentation of a voxel grid, in which the pocket center is calculated as the geometric center of the positively labeled voxels. Bottom Left: Geometric Deep Learning approaches, such as EGNN, in which

实验结果

研究问题

RQ1VN-EGNN 是否能在绑定位点中心的定位上超越先前的等变 GNNs？
RQ2虚拟节点是否缓解信息挤压并提升对结合位点识别的表达能力？
RQ3VN-EGNN 的预测是否在标准绑定位点基准上达到最先进的 DCC 和 DCA？
RQ4当预测多个结合位点中心并按置信度排序时，模型的表现如何？

主要发现

方法	参数	COACH420 DCC	COACH420 DCA	HOLO4K DCC	HOLO4K DCA	PDBbind2020 DCC	PDBbind2020 DCA
VN-EGNN (ours)	1.20	0.605(0.009)	0.750(0.008)	0.532(0.021)	0.659(0.026)	0.669(0.015)	0.820(0.010)

VN-EGNN 在 COACH420、HOLO4K、PDBbind2020 基准上实现了最先进的 DCC（如表 1 所示）。
在 COACH420 上，VN-EGNN 取得了所比较方法中最佳的 DCA 分数；在 PDBbind2020 上，它与 P2Rank 的 DCA 性能相匹配。
消融研究表明，完整的 VN-EGNN（包含所有组件：虚拟节点、异质消息传递和残基嵌入）在所有数据集上实现了最佳性能。
模型的虚拟节点学习推断结合位点中心，坐标在训练过程中收敛到实际的配体结合位置。
多任务目标（结合中心定位和分割损失），结合自信度模块，提升了结合位点预测及其排序。

Figure 2: Left: Example of a prediction from our model: Initial positions of the virtual nodes are represented by the yellow spheres around the protein, the ground truth binding site is indicated by the light violet ligand, whereas violet regions on the protein represent the annotated binding site.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。