Skip to main content
QUICK REVIEW

[论文解读] EspalomaCharge: Machine learning-enabled ultra-fast partial charge assignment

Yuanqing Wang, Iván Pulido|arXiv (Cornell University)|Feb 14, 2023
Machine Learning in Materials Science参考文献 45被引用 9
一句话总结

EspalomaCharge 使用混合图神经网络与电荷平衡化来预测类似 AM1-BCC ELF10 的部分电荷,具有 O(N) 线性扩展,能够实现小分子和生物聚合物的快速、构型无关充电。它在参考准确度上保持一致,并可作为常用工作流的即插即用替代品集成。

ABSTRACT

Atomic partial charges are crucial parameters in molecular dynamics (MD) simulation, dictating the electrostatic contributions to intermolecular energies, and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of extit{ab initio} semiempirical quantum chemical methods such as AM1-BCC, and is expensive for large systems or large numbers of molecules. We propose a hybrid physical / graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserves total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling, for the first time, the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package exttt{espaloma\_charge}, this approach provides drop-in replacements for both AmberTools exttt{antechamber} and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at \url{https://github.com/choderalab/espaloma_charge}.

研究动机与目标

  • 开发一种快速、准确的部分电荷分配方法,使其对构型无关并可扩展到大型生物分子。
  • 利用图神经网络预测原子级的电负性和硬度参数以进行电荷平衡化。
  • 通过解析的约束解确保所预测的电荷之和等于总分子电荷(Q)。
  • 与现有分子力学工作流(AmberTools、Open Force Field Toolkit)实现易于集成。
  • 证明该方法以较QM方法成本的一小部分实现 AM1-BCC ELF10 质量的电荷。

提出的方法

  • 构建一个图神经网络,以产生原子环境的连续嵌入(Espaloma 框架)。
  • 从 GNN 嵌入中预测每个原子的无约束电负性 e_i 与硬度 s_i。
  • 通过最小化总和 ∑_i (e_i q_i + 0.5 s_i q_i^2) 且约束 ∑_i q_i = Q(总分子电荷)来解析求解电荷 q_i。
  • 在扩展的 SPICE 数据集上训练,以平方损失重现 AM1-BCC ELF10 电荷。
  • 展示 O(N) 运行时复杂度和对大分子集合的分批处理能力。
  • 提供 Python API 和 CLI,以便与 OpenFF Toolkit 和 Amber 工作流集成。
Figure 1: Schematic overview of EspalomaCharge: a hybrid physical / GNN model for fast charge assignment. First, the graph node representation $h$ assigned by a GNN is used to compute unconstrained electronegativity $e_{i}$ and hardness $s_{i}$ to each atom. Second, the charge potential energy is mi
Figure 1: Schematic overview of EspalomaCharge: a hybrid physical / GNN model for fast charge assignment. First, the graph node representation $h$ assigned by a GNN is used to compute unconstrained electronegativity $e_{i}$ and hardness $s_{i}$ to each atom. Second, the charge potential energy is mi

实验结果

研究问题

  • RQ1ML 替代方案在不同化学空间中能多准确地再现 AM1-BCC ELF10 电荷?
  • RQ2与 AmberTools 和 OpenEye 相比,EspalomaCharge 对大系统的计算缩放和速度如何?
  • RQ3EspalomaCharge 是否能推广到训练分布之外的生物分子和药物样化合物?
  • RQ4EspalomaCharge 能否在现有 MM/MD 工作流中实现无缝的电荷交付?

主要发现

  • EspalomaCharge 在 RMSE 上与 AM1-BCC ELF10 电荷之间的差异相当于不同 AM1-BCC 实现之间的差异(且通常与 AmberTools vs OpenEye 相当)。
  • 在 SPICE 测试集中,RMSE 约为 0.0435,墙时约为 93.10 s(EspalomaCharge),相比基线更快;OpenEye 与 AmberTools 在不同情境下显示更高的时间。
  • 在多个数据集(FDA-approved、ZINC250K、FreeSolv、PDB eXpo)中,EspalomaCharge 的 RMSE 值处于 0.0110–0.0266 范围,表明在化学空间中具有鲁棒精度。
  • EspalomaCharge 以与原子数目成线性关系的时间运行(O(N)),远快于基于 QM 的充电方法,可在秒内对生物聚合物(数百残基)进行参数化。
  • 在单次充电计算中对许多分子进行批处理,在 CPU/GPU 上可获得显著的速度提升,接近实际库大小的近常数时间。
  • 使用 EspalomaCharge 电荷的溶解吸收自由能计算,在 RMSE 和 R^2 与实验相比,与 AmberTools 和 OpenEye 实现统计上无显著差异。
Figure 2: EspalomaCharge shows smaller average charge RMSE than AmberTools on well-represented regions of chemical space. SPICE dataset test set performance stratified by total charge ( left panel ) and molecule size ( right panel ). To better illustrate the effects of limited training data on strat
Figure 2: EspalomaCharge shows smaller average charge RMSE than AmberTools on well-represented regions of chemical space. SPICE dataset test set performance stratified by total charge ( left panel ) and molecule size ( right panel ). To better illustrate the effects of limited training data on strat

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。