QUICK REVIEW

[论文解读] Artificial Intelligence Assists Discovery of Reaction Coordinates and Mechanisms from Molecular Dynamics Simulations

Hendrik Jung, Roberto Covino|arXiv (Cornell University)|Jan 14, 2019

Gaussian Processes and Bayesian Inference参考文献 2被引用 39

一句话总结

本论文提出一个AI辅助框架，指导采样并从分子动力学模拟中提取分子机制，利用自适应采样、神经网络和符号回归揭示反应坐标。它展示了可解释性，并适用于如丙氨酸二肽和 LiCl 等模型系统。

ABSTRACT

Exascale computing holds great opportunities for molecular dynamics (MD) simulations. However, to take full advantage of the new possibilities, we must learn how to focus computational power on the discovery of complex molecular mechanisms, and how to extract them from enormous amounts of data. Both aspects still rely heavily on human experts, which becomes a serious bottleneck when a large number of parallel simulations have to be orchestrated to take full advantage of the available computing power. Here, we use artificial intelligence (AI) both to guide the sampling and to extract the relevant mechanistic information. We combine advanced sampling schemes with statistical inference, artificial neural networks, and deep learning to discover molecular mechanisms from MD simulations. Our framework adaptively and autonomously initializes simulations and learns the sampled mechanism, and is thus suitable for massively parallel computing architectures. We propose practical solutions to make the neural networks interpretable, as illustrated in applications to molecular systems.

研究动机与目标

通过降低在发现机制方面的人类瓶颈来推动exa级别的MD。
开发一个AI框架，使其能够自主初始化模拟并学习采样到的机制，以用于大规模并行计算。
将先进取样、统计推断、神经网络和符号回归结合起来，以识别可解释的反应坐标。
提供在MD情境中提高神经网络可解释性的实用方法。

提出的方法

使用带有 Metropolis-Hastings 接受准则的跃迁路径取样来引导射击配置。
用深度神经网络表示未知的反应坐标 q(x)，如等式（3）所示。
训练人工神经网络以识别定义反应坐标的相关输入坐标。
应用可微分的笛卡尔遗传编程（符号回归）来用显式表达式近似训练好的ANN。
在符号回归过程中加入正则化项以控制模型复杂度并避免过拟合。
展示分子系统中可解释的反应坐标表达式。

实验结果

研究问题

RQ1AI 指导的取样如何加速MD模拟中跃迁路径的生成和收敛？
RQ2哪些输入坐标在不同系统中最强烈地定义了反应坐标？
RQ3符号回归是否能够产生紧凑且可解释的表达式，近似神经网络的反应坐标？
RQ4在如丙氨酸二肽和 LiCl 的模型系统中，与标准 TPS 相比，AI 框架的表现如何？
RQ5在此MD情境下，确保神经网络可解释性的实际策略有哪些？

主要发现

AI辅助的MD在测试运行中相对于标准 TPS 增加跃迁路径的累计生成和接受率（在丙氨酸二肽、LiCl 的测试中）。
该框架相对于长时控TPS基线，显著加快了跃迁路径时间的收敛。
输入相关性分析识别出一小组坐标，它们主导定义反应坐标（例如丙氨酸二肽的特定二面角分量）。
在正则化下，符号回归产生紧凑的表达式，能够紧密近似基于ANN的反应坐标。
模型系统显示可以恢复可解释的 q(x) 表达式，例如包含二面角项以及 logistic/ln 成分的 q_SR 表达式。
该方法支持自主初始化和学习，适用于大规模并行计算架构。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。