QUICK REVIEW

[論文レビュー] ChemCrow: Augmenting large-language models with chemistry tools

Andres M Bran, Sam Cox|arXiv (Cornell University)|Apr 11, 2023

Machine Learning in Materials Science被引用数 131

ひとこと要約

ChemCrow は LLM に対して 18 の化学ツールを追加し、自律的に合成を計画・実行し、発見タスクを支援することで、単純な LLM を超えた化学的推論を向上させる。

ABSTRACT

Over the last decades, excellent computational chemistry tools have been developed. Integrating them into a single platform with enhanced accessibility could help reaching their full potential by overcoming steep learning curves. Recently, large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 18 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and Chemcrow's performance. Our work not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.

研究の動機と目的

化学的推論の制約を克服するため、ドメイン特化ツールで LLM を橋渡しする動機づけ。
LLM-エージェント枠組みを用いて、化学的合成の自律的計画と実行を実証する。
クロモフォア設計などの発見タスクにおける人間と AI の協働を示す。
専門の化学者による評価を用いて、純粋な LLM（GPT-4）と比較して ChemCrow を評価する。
LLM 主導の化学における安全性とリスク低減戦略を示す。

提案手法

説明されたツールセットと明示的な Thought–Action 入力ループ（ReAct/MRKL におけるもの）のある LLM（GPT-4）に対して、ツールの使用と入力を決定させるプロンプトを与える。
LangChain を介して、18 のドメイン特化化学ツール（ウェブ/文献検索、分子/反応ツール、安全性チェックなど）を統合する。
合成と検証のため、クラウド接続プラットフォーム（例：IBM RoboRXN）上で自律的な実行を可能にする。
反復的なツールの照会と観察を用いて、タスクが完了するまで行動を洗練させる。
専門の化学者と評価用 LLM（EvaluatorGPT）を、GPT-4 ベースラインとともに用いて性能を評価する。
安全性ガイドラインとリスク低減戦略を強調し、不 Unsafe な推奨を防ぐ。

実験結果

リサーチクエスチョン

RQ1LLM を搭載した化学エージェントは、ラボ環境で自律的に多段階の合成を計画・実行できるか？
RQ2ドメイン特化ツールを統合することで、ツールなしの LLM と比較して、化学的事実性、推論品質、タスク完了率が向上するか？
RQ3人間と AI の協働を含む発見タスク（例：新規クロモフォア設計）における ChemCrow の性能はどうか？
RQ4LLM 主導の化学における安全性・倫理・知的財産（IP）に関する問題は何が生じ、どのように緩和できるか？

主な発見

ChemCrow は RoboRXN プラットフォームを用いて、DEET（忌避剤）の合成と3つのチオウレア有機触媒の合成を自律的に計画・実行した。
吸収最大が約 336 nm にある新規クロモフォアを人間と AI の協力で発見し、後に合成・特性評価を行った。
ChemCrow は専門の化学者による評価で、より複雑なタスクにおいて、ツールなしの GPT-4 より化学的事実性、推論、完成度で優れていた。
GPT-4 単独は、覚えやすいタスク（例：パラセタモールのような既知分子）や流暢な文章表現で強みを示したが、新規の化学的推論には苦戦した。
吸収予測の二乗平均平方根誤差は 37 nm だった。
本研究は、LLM搭載化学エンジンにおける堅牢な評価方法、ツール品質、安全性・IP の検討の必要性を強調している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。