QUICK REVIEW

[论文解读] Deep-Learning Based Docking Methods: Fair Comparisons to Conventional Docking Workflows

Ajay N. Jain, Ann E. Cleves|arXiv (Cornell University)|Dec 3, 2024

Cloud Computing and Resource Management被引用 14

一句话总结

本论文在 DiffDock 基于扩散的对接方法与传统对接工作流之间提供了一个公平的基线比较，结果显示在相同测试集上成熟的对接方法优于 DiffDock，并且近邻训练样本会显著偏向 DiffDock 报告的性能。

ABSTRACT

The diffusion learning method, DiffDock, for docking small-molecule ligands into protein binding sites was recently introduced. Results included comparisons to more conventional docking approaches, with DiffDock showing superior performance. Here, we employ a fully automatic workflow using the Surflex-Dock methods to generate a fair baseline for conventional docking approaches. Results were generated for the common and expected situation where a binding site location is known and also for the condition of an unknown binding site. For the known binding site condition, Surflex-Dock success rates at 2.0 Angstroms RMSD far exceeded those for DiffDock (Top-1/Top-5 success rates, respectively, were 68/81% compared with 45/51%). Glide performed with similar success rates (67/73%) to Surflex-Dock for the known binding site condition, and results for AutoDock Vina and Gnina followed this pattern. For the unknown binding site condition, using an automated method to identify multiple binding pockets, Surflex-Dock success rates again exceeded those of DiffDock, but by a somewhat lesser margin. DiffDock made use of roughly 17,000 co-crystal structures for learning (98% of PDBBind version 2020, pre-2019 structures) for a training set in order to predict on 363 test cases (2% of PDBBind 2020) from 2019 forward. DiffDock's performance was inextricably linked with the presence of near-neighbor cases of close to identical protein-ligand complexes in the training set for over half of the test set cases. DiffDock exhibited a 40 percentage point difference on near-neighbor cases (two-thirds of all test cases) compared with cases with no near-neighbor training case. DiffDock has apparently encoded a type of table-lookup during its learning process, rendering meaningful applications beyond its reach. Further, it does not perform even close to competitively with a competently run modern docking workflow.

研究动机与目标

在不事先了解共晶结构的情况下，为 DiffDock 测试集建立传统对接的公平基线。
在已知结合位点位置下评估对接性能，以及在盲对接（未知结合位点）情景下的性能。
研究 DiffDock 表现似乎优秀是否受近邻训练数据影响，而非真实对接能力。
量化传统对接方法在多种流行对接工具（Surflex-Dock、Glide、AutoDock Vina、Gnina）上与 DiffDock 的对比。
评估训练数据组成对 DiffDock 表现的影响，并为 CADD 评估提供指南。

提出的方法

使用一个全自动的 Surflex-Dock 流程来生成传统对接的基线。
使用自 PDBBind 2020 派生的整洁测试集对 DiffDock 测试数据进行处理，并与多种对接工具进行比较。
评估已知结合位点对接以及自动口袋识别（未知结合位点）场景。
使用同源/同配体重对接性能将 DiffDock 与 Surflex-Dock、Glide、AutoDock Vina 和 Gnina 进行比较。
通过将测试用例分类为近邻或非近邻来分析近邻训练案例对 DiffDock 性能的影响。
所有分析以 PDBBind 2020 数据为基础：约 17,000 个训练复合体和 363 个测试用例（Clean Test Set 中为 290 个）。

实验结果

研究问题

RQ1在已知结合位点条件下，成熟的自动对接工作流（Surflex-Dock）是否在 DiffDock 测试集上优于 DiffDock？
RQ2在对已知结合位点进行对接时，DiffDock 相对于 Glide、AutoDock Vina 和 Gnina 的表现如何？
RQ3盲对接（未知结合位点）中 DiffDock 的表现相对于带口袋识别的 Surflex-Dock 如何？
RQ4近邻训练案例在多大程度上推动 DiffDock 的报告成功，这如何影响公平对比？
RQ5关于面向 ML 的对接方法在 CADD 评估实践方面可以得出哪些教训？

主要发现

Surflex-Dock 在已知结合位点上的 Top-1/Top-5 成功率大约分别为 68% 和 81%，高于 DiffDock 的 45% 和 51%。
Glide 在已知结合位点的表现与 Surflex-Dock 相近（Top-1/Top-5 约为 67%/73%）。
AutoDock Vina 与 Gnina 遵循相同模式，在已知位点的同配体重对接中表现优于 DiffDock。
在未知结合位点（盲对接）下，Surflex-Dock 的 Top-5 成功率在 1.0 Å 时比 DiffDock 高出 15–20 个百分点，在 2.0 Å 时高出约 10 个百分点，且存在显著的离群值考虑。
大约三分之二的 DiffDock 测试用例（191/290）具有近邻训练样本，导致表现显著更高（Top-1/Top-5 ~57%/65%），而非近邻的用例大约为 21%/28%。
极端近邻用例（24 个）达到 >90% 的成功率，显示出近记忆效应而非真正的对接泛化。即使在近邻子集中，DiffDock 的整体表现也逊于 Surflex-Dock 和 Glide。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。