QUICK REVIEW

[論文レビュー] Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis

Kexin Chen, Jiamin Lu|arXiv (Cornell University)|Nov 16, 2023

Advanced Text Analysis Techniques被引用数 10

ひとこと要約

Chemist-X は、反応条件を提案するための取得拡張生成を用いるLLM駆動エージェントで、APIベースの分子取得、ウェブ文献分析のウェブリテラシー、CADツールを用いた最終推奨を統合し、新規CL-SCL指紋を持つ。

ABSTRACT

Recent AI research plots a promising future of automatic chemical reactions within the chemistry society. This study proposes Chemist-X, a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis with retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions. To begin with, as an emulation on how chemical experts solve the RCO task, Chemist-X utilizes a novel RAG scheme to interrogate available molecular and literature databases to narrow the searching space for later processing. The agent then leverages a computer-aided design (CAD) tool we have developed through a large language model (LLM) supervised programming interface. With updated chemical knowledge obtained via RAG, as well as the ability in using CAD tools, our agent significantly outperforms conventional RCO AIs confined to the fixed knowledge within its training data. Finally, Chemist-X interacts with the physical world through an automated robotic system, which can validate the suggested chemical reaction condition without human interventions. The control of the robotic system was achieved with a novel algorithm we have developed for the equipment, which relies on LLMs for reliable script generation. Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.

研究の動機と目的

化学者の作業負荷を削減するための自動化されたAI支援の反応条件推奨（RCR）の動機付け。
オンライン分子データベースおよび文献からのデータ取得により、最新の化学知識を実現。
API駆動のコード生成とツールアクセスを提供して、化学者とソフトウェアの橋渡しをする。
専門家の問題解決プロセスを模倣する3相フレームワークを開発：類似例の検索、文献の分析、条件の推奨。
化学を意識した反応指紋（CL-SCL）を導入して、収率重視の予測を改善。

提案手法

Phase One: 最良一致のスライス（TMS）選択を含むインコンテキスト学習を用いて、PubChem/ChemSpiderなどの分子データベースをAPI経由で照会する取得強化コード生成。
Phase Two: 生成したPythonコードとHTML分析モジュールを用いて、オンライン文献（SciFinder/PubMed）からウェブクローリングとHTMLデータ抽出を行い、反応条件を抽出。
Phase Three: CADツールAPIと統合したCL-SCL指紋（CIMGベースの分子エンコーディングと教師付きコントラスト学習）を用いた最終推奨。高収率の反応条件を選択。
Phase Three: CADツールAPIと統合したCL-SCL指紋（CIMGベースの分子エンコーディングと教師付きコントラスト学習）を用いた最終推奨。高収率の反応条件を選択。
Implementation of a three-phase AI agent powered by an LLM with API access, code generation, and tool orchestration; evaluation includes unit tests and wet-lab validation.
Introduction of a novel reactive fingerprint (CL-SCL) that combines CIMG molecular encoding with supervised contrastive learning to improve yield prediction across ML models.

実験結果

リサーチクエスチョン

RQ1フェーズ駆動のデータ収集と分析を横断して、取得拡張AIエージェントは反応条件推奨（RCR）を効果的に実行できるか。
RQ2TMS-ICLは、ゼロショットおよび全文プロンプトと比較して、化学データベースのAPIベース情報検索をどのように改善するか。
RQ3文献プラットフォームからのHTMLデータ抽出は、コード生成のHTML分析を使用する場合、LLMへの直接HTML入力と比べてより正確で資源効率的か。
RQ4CL-SCL指紋は、複数のMLモデルおよびデータバッチにおいて、収率志向のRCR性能を上回るか。
RQ5制約された化学的サブスペース内で、エージェントが推奨する高収率反応条件をウェットラボ実験が検証するか。

主な発見

3相の Chemist-X フレームワークは、自動知識更新とCADツールの使用を可能にし、固定知識に基づく統合AIを上回る。
Phase One with TMS-ICL improves API retrieval accuracy and reduces cost and time relative to alternatives.
Phase Twoのコード生成HTML分析は、30のHTMLソースから143個中141件の正確データ点を抽出（F1=99.3%）、全HTML入力アプローチを上回る。
Phase Threeは、μ_N指標で、MLモデルおよびバッチサイズを問わず、CL-SCL指纹がDRFPおよびMordred指紋を一貫して上回すことを示す。
Suzuki–Miyaura領域のウェットラボ実験は、Chemist-X主導の3つの実験バッチで平均収率≥90%を達成、ランダムサンプリングでは約52%だった。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。