QUICK REVIEW

[论文解读] MDCrow: Automating Molecular Dynamics Workflows with Large Language Models

Quintina L. Campbell, Sam Cox|ArXiv.org|Feb 13, 2025

Machine Learning in Materials Science被引用 14

一句话总结

MDCrow 是一个由大语言模型驱动的智能体，利用 40+ 个与分子动力学相关的工具集自治自动化分子动力学工作流程，在 25 项任务和多种基础模型上进行评估。它在 gpt-4o 和 llama3-405b 上表现出色，并且在较大模型的不同提示风格下具有鲁棒性。

ABSTRACT

Molecular dynamics (MD) simulations are essential for understanding biomolecular systems but remain challenging to automate. Recent advances in large language models (LLM) have demonstrated success in automating complex scientific tasks using LLM-based agents. In this paper, we introduce MDCrow, an agentic LLM assistant capable of automating MD workflows. MDCrow uses chain-of-thought over 40 expert-designed tools for handling and processing files, setting up simulations, analyzing the simulation outputs, and retrieving relevant information from literature and databases. We assess MDCrow's performance across 25 tasks of varying required subtasks and difficulty, and we evaluate the agent's robustness to both difficulty and prompt style. exttt{gpt-4o} is able to complete complex tasks with low variance, followed closely by exttt{llama3-405b}, a compelling open-source model. While prompt style does not influence the best models' performance, it has significant effects on smaller models.

研究动机与目标

演示一个完全自主的 LLM 代理（MDCrow），可以从设置到分析完成整个 MD 工作流程。
评估 MDCrow 在不同难度和子任务的任务中的表现。
评估 MDCrow 对不同提示风格和基础 LLM 选择的鲁棒性。
将 MDCrow 与基线进行比较，并分析其检索文献和管理长时间运行仿真的能力。

提出的方法

将 MDCrow 构建为一个使用 LangChain、带有 ReAct 风格提示、并包含超过 40 种专门的 MD 工具的 LLM 代理。
将工具分为信息检索、PDB 与蛋白质处理、仿真和分析。
使用基于 OpenMM 和 MDTraj 的工作流进行仿真设置、执行和分析；生成可适应的 Python 脚本。
实现一个聊天-简历功能，将运行上下文和文件保存在检查点目录中以便继续会话。
在 25 个提示下评估在不同子任务和多种基础 LLM（gpt-3.5-turbo、gpt-4-turbo、gpt-4o、llama、Claude）上的性能。
将 MDCrow 与 ReAct 基线（Python REPL）和单查询 LLM 进行比较。

Figure 1: A. MDCrow workflow. Starting with a user prompt and initialized with a set of MD tools, MDCrow follows a chain-of-thought process until it completes all tasks in the prompt. The final output includes a response, along with all resulting analyses and files. B . The tool distribution categor

实验结果

研究问题

RQ1MDCrow 能否从数据检索到分析，自动完成 MD 工作流？
RQ2MDCrow 的性能如何随任务复杂性（子任务数量）和不同基础 LLM 而变化？
RQ3MDCrow 对提示风格变体和模型类型的鲁棒性如何？
RQ4MDCrow 在完成 MD 任务、处理文件管理和错误方面是否优于基线方法？
RQ5通过其聊天功能在多大程度上能够推断至超出其明确工具集的任务？

主要发现

MDCrow 完成大多数任务，使用 gpt-4o 时表现最佳，准确度和完成子任务数量均高于基线。
开源的 llama3-405b 提供了具有强劲性能和鲁棒性的有力替代方案。
对于较弱模型，性能通常随任务复杂性增加而下降，而 gpt-4o 和 llama-405b 在子任务之间表现稳定。
提示风格对较小模型影响显著，但对表现最好的模型影响较小。
在评估中，经过优化设置的 MDCrow 的任务准确率约为 72%，并且对于强模型指令风格的变化具有鲁棒性。

Figure 2: Example Chat Example of chat with MDCrow. The user first asks to download PDB files for two systems. Then, once MDCrow has completed this task, the user asks for analysis of the files. Next, the user asks for a quick 10 ps simulation of both files, and MDCrow saves all files for later hand

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。