QUICK REVIEW

[论文解读] Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

Kexin Chen, Jiamin Lu|arXiv (Cornell University)|Nov 16, 2023

Advanced Text Analysis Techniques被引用 10

一句话总结

Chemist-X 是一个由大语言模型驱动的代理，使用检索增强生成来提出化学合成的反应条件，整合基于 API 的分子检索、具备网络文献分析能力的网页检索，以及 CAD 工具驱动的最终推荐，并引入新颖的 CL-SCL 指纹。

ABSTRACT

Recent AI research plots a promising future of automatic chemical reactions within the chemistry society. This study proposes Chemist-X, a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis with retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions. To begin with, as an emulation on how chemical experts solve the RCO task, Chemist-X utilizes a novel RAG scheme to interrogate available molecular and literature databases to narrow the searching space for later processing. The agent then leverages a computer-aided design (CAD) tool we have developed through a large language model (LLM) supervised programming interface. With updated chemical knowledge obtained via RAG, as well as the ability in using CAD tools, our agent significantly outperforms conventional RCO AIs confined to the fixed knowledge within its training data. Finally, Chemist-X interacts with the physical world through an automated robotic system, which can validate the suggested chemical reaction condition without human interventions. The control of the robotic system was achieved with a novel algorithm we have developed for the equipment, which relies on LLMs for reliable script generation. Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.

研究动机与目标

推动自动化、AI 辅助的反应条件推荐（RCR），以降低化学家工作负担。
通过从在线分子数据库和文献检索数据，使化学知识保持最新。
通过提供基于 API 的代码生成和工具访问，连接化学家与软件。
开发三阶段框架，模仿专家的问题解决过程：搜索类比物、分析文献、并给出条件推荐。
引入化学感知的反应指纹（CL-SCL），以改进以产率为导向的预测。

提出的方法

阶段一：通过 API 使用上下文学习并结合顶匹配切片（TMS）选择，对分子数据库（PubChem/ChemSpider）进行检索增强的代码生成。
阶段二：使用生成的 Python 代码和 HTML 分析模块对在线文献（SciFinder/PubMed）进行网页抓取和 HTML 数据提取，以提炼反应条件。
阶段三：使用 CL-SCL 指纹（基于 CIMG 的分子编码，结合有监督对比学习）进行最终推荐，并与 CAD 工具 API 集成，以选择高产率的反应条件。
实现一个由大语言模型驱动、具备 API 访问、代码生成和工具编排的三阶段 AI 代理；评估包括单元测试和湿实验验证。
引入新型反应指纹（CL-SCL），将 CIMG 分子编码与有监督对比学习相结合，以在多种机器学习模型中改进产率预测。

实验结果

研究问题

RQ1检索增强的 AI 代理是否能够在分阶段的数据收集与分析中有效执行反应条件推荐（RCR）？
RQ2相较于零-shot 和全文档提示，TMS-ICL 如何提升对化学数据库的基于 API 的信息检索？
RQ3使用代码生成的 HTML 分析与直接将 HTML 输入给大型语言模型相比，是否更准确且资源更高效地从文献平台提取 HTML 数据？
RQ4CL-SCL 指纹是否在多种机器学习模型和数据批次中提供优越的以产率为导向的 RCR 性能？
RQ5湿实验是否能在受限的化学子空间内验证该代理推荐的高产率反应条件？

主要发现

三阶段的 Chemist-X 框架能够实现自动知识更新和 CAD 工具的使用，超越固定知识的综合 AI。
阶段一采用 TMS-ICL 提高了 API 检索准确性，相对于其他方案在成本和时间上也有降低。
阶段二的代码生成 HTML 分析从 30 个 HTML 来源中提取了 143 个数据点中的 141 个正确数据点（F1=99.3%），优于全 HTML 输入方法。
阶段三显示 CL-SCL 指纹在多种机器学习模型和批量大小下，在 μ_N 指标上始终优于 DRFP 和 Mordred 指纹。
在 Suzuki–Miyaura 体系中的湿实验，在三个 Chemist-X 主导的实验批次中平均产率≥90%，而随机抽样约为 52%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。