QUICK REVIEW

[论文解读] Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose Prediction

Eric Alcaide, Zhifeng Gao|arXiv (Cornell University)|May 20, 2024

Advanced Surface Polishing Techniques被引用 6

一句话总结

Uni-Mol Docking V2 在 PoseBusters 上实现了最先进的结合姿态预测，77.6% <2.0 Å RMSD 和 95.29% Astex 可信度，解决了先前 ML docking 的化学不准确问题。

ABSTRACT

In recent years, machine learning (ML) methods have emerged as promising alternatives for molecular docking, offering the potential for high accuracy without incurring prohibitive computational costs. However, recent studies have indicated that these ML models may overfit to quantitative metrics while neglecting the physical constraints inherent in the problem. In this work, we present Uni-Mol Docking V2, which demonstrates a remarkable improvement in performance, accurately predicting the binding poses of 77+% of ligands in the PoseBusters benchmark with an RMSD value of less than 2.0 Å, and 75+% passing all quality checks. This represents a significant increase from the 62% achieved by the previous Uni-Mol Docking model. Notably, our Uni-Mol Docking approach generates chemically accurate predictions, circumventing issues such as chirality inversions and steric clashes that have plagued previous ML models. Furthermore, we observe enhanced performance in terms of high-quality predictions (RMSD values of less than 1.0 Å and 1.5 Å) and physical soundness when Uni-Mol Docking is combined with more physics-based methods like Uni-Dock. Our results represent a significant advancement in the application of artificial intelligence for scientific research, adopting a holistic approach to ligand docking that is well-suited for industrial applications in virtual screening and drug design. The code, data and service for Uni-Mol Docking are publicly available for use and further development in https://github.com/dptech-corp/Uni-Mol.

研究动机与目标

解决 ML docking 方法在物理可行性和化学准确性方面的问题。
提升 PoseBusters 基准在未见蛋白-配体复合体上的表现。
提供可复现的数据处理与对接流水线，并公开可获得的代码与数据。
展示与基于物理的对接整合潜力，用于工业级虚拟筛选。

提出的方法

在 Uni-Mol 框架上对分子和口袋编码器进行预训练，以初始化对接模型。
对蛋白质数据进行氢原子添加、质子化信息以及完整的重原子/残基处理。
在 MOAD 派生数据上训练对接器，使用 9:1 的训练/验证分割，训练 100 个时期，使用 8 块 V100 GPU。
使用基于与配体等体积相匹配的立方区域的口袋构象，边界留有 10 Å 的边距。
以口袋+配体构象作为输入，输出 3D 配体构象；口袋特征为提高效率而预计算。
将 Uni-Mol Docking 与基于物理的 Uni-Dock 结合，以提升准确性并纳入辅助因子和水分子。

实验结果

研究问题

RQ1在 PoseBusters 上，基于 ML 的对接是否能在没有手性反转或立体冲突的情况下获得化学上合理的构象？
RQ2可重复的数据处理和输入生成对 PoseBusters 和 Astex 基准性能有何影响？
RQ3将基于物理的对接（Uni-Dock）与 Uni-Mol Docking 相结合，是否能提供更高质量且更符合物理规律的预测？
RQ4在未见数据上，Uni-Mol Docking V2 相对于以往的 ML 对接模型表现如何？
RQ5PoseBusters 目标中的生化空间覆盖率有多大，V2 如何处理多样的蛋白-配体复合物？

主要发现

Uni-Mol Docking V2 在 PoseBusters 上达到 77.6% 的配体 RMSD <2.0 Å，在 Astex 上达到 95.29%（如报道），超过了此前的开源模型。
75% 以上的 PoseBusters 复合体通过所有质量检查。
Uni-Mol Docking V2 产出化学上准确的预测，未观察到手性反转或立体冲突。
与 Uni-Dock 结合时，高质量预测（RMSD <1.0 Å 和 <1.5 Å）有所提高，物理可行性仍然很高（>95%）。
标准化的数据处理流水线和公开代码使结果可重复，并可与其他对接方法进行公开对比。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。