QUICK REVIEW

[论文解读] DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration

Sizhe Liu, Yizhou Lu|arXiv (Cornell University)|Nov 24, 2024

Scientific Computing and Data Management被引用 7

一句话总结

DrugAgent 使用多代理 LLM 框架来在药物发现中自动化 ML 编程，具备领域特定工具和动态的想法空间管理；一个案例研究在 PAMPA 数据上实现 ADMET 吸收预测的 F1 分数为 0.92。

ABSTRACT

Recent advancements in Large Language Models (LLMs) have opened new avenues for accelerating drug discovery processes. Despite their potential, several critical challenges remain unsolved, particularly in translating theoretical ideas into practical applications within the highly specialized field of pharmaceutical research, limiting practitioners from leveraging the latest AI development in drug discovery. To this end, we introduce DrugAgent, a multi-agent framework aimed at automating machine learning (ML) programming in drug discovery. DrugAgent incorporates domain expertise by identifying specific requirements and building domain-specific tools, while systematically exploring different ideas to find effective solutions. A preliminary case study demonstrates DrugAgent's potential to overcome key limitations LLMs face in drug discovery, moving toward AI-driven innovation. For example, DrugAgent is able to complete the ML programming pipeline end-to-end, from data acquisition to performance evaluation for the ADMET prediction task, and finally select the best model, where the random forest model achieves an F1 score of 0.92 when predicting absorption using the PAMPA dataset.

研究动机与目标

弥合通用 LLM 推理与领域特定药物发现需求之间的差距。
在药物发现中自动化 ML 编程任务，从数据获取到模型评估，无需人工编码。
引入领域特定工具和想法空间管理策略，以提高探索效率。
展示对 ADMET 预测的端到端自动化，并与通用框架进行比较。

提出的方法

引入一个基于自动化 LLM 的药物发现 ML 编程多代理系统。
结合 LLM 指导者以识别领域知识需求并准备工具。
使用 LLM 规划器通过生成与修剪来管理和精炼想法空间。
开发具备单元测试的领域特定工具和可重复使用的工具箱，用于数据获取、指纹化和模型评估。
展示从数据获取到模型评估的端到端流水线，并选出最佳模型。

Figure 1: Framework overview of DrugAgent. Given an AI-based drug discovery task described in natural language (i.e., user’s input, e.g., design an AI model to predict Absorption (one of the ADMET properties) using the PAMPA dataset (Siramshetty, Shah et al. 2021 ) , the LLM Planner first produces a

实验结果

研究问题

RQ1如何明确识别并将领域特定知识整合到用于药物发现的基于 LLM 的 ML 编程中？
RQ2多代理框架是否能系统地探索并修剪不可行或次优的想法，以提高自动化效率？
RQ3与通用基线相比，DrugAgent 在标准的 AI 驱动药物发现任务（ADMET、DTI、分子优化）上的表现如何？

主要发现

DrugAgent 可以为 PAMPA 数据上的 ADMET 预测实现端到端的 ML 编程自动化。
随机森林模型在 PAMPA 吸收预测上达到 F1 = 0.92 且 ROC-AUC = 0.817。
ChemBERTa 在同一任务上达到 F1 = 0.916 且 ROC-AUC = 0.776。
DrugAgent 通过有效整合领域知识和工具构建，优于通用框架（ReAct），减少对人工干预的依赖。
该框架将想法空间管理与领域工具构建相结合，以修剪无效的方法（例如分子图构建）。

Figure 2: Comparison of ReAct (a) and DrugAgent (b) on an ADMET prediction task using the PAMPA dataset. ReAct, a general-purpose framework, fails due to hallucinated API calls and an inability to self-debug, requiring human intervention to proceed. It focuses solely on fine-tuning a pretrained lang

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。