QUICK REVIEW

[논문 리뷰] ChemCrow: Augmenting large-language models with chemistry tools

Andres M Bran, Sam Cox|arXiv (Cornell University)|2023. 04. 11.

Machine Learning in Materials Science인용 수 131

한 줄 요약

ChemCrow는 18개의 화학 도구로 LLM을 확장하여 합성의 자율 계획 및 실행을 통해 탐구 작업을 지원하고, 순수 LLM보다 화학 추론을 향상시킨다.

ABSTRACT

Over the last decades, excellent computational chemistry tools have been developed. Integrating them into a single platform with enhanced accessibility could help reaching their full potential by overcoming steep learning curves. Recently, large-language models (LLMs) have shown strong performance in tasks across domains, but struggle with chemistry-related problems. Moreover, these models lack access to external knowledge sources, limiting their usefulness in scientific applications. In this study, we introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery, and materials design. By integrating 18 expert-designed tools, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow's effectiveness in automating a diverse set of chemical tasks. Surprisingly, we find that GPT-4 as an evaluator cannot distinguish between clearly wrong GPT-4 completions and Chemcrow's performance. Our work not only aids expert chemists and lowers barriers for non-experts, but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.

연구 동기 및 목표

LLM에 도메인별 도구를 연결하여 화학 추론 한계를 극복하고자 하는 동기를 제공한다.
LLM-에이전트 프레임워크를 사용하여 화학 합성의 자율적 계획 및 실행을 실증한다.
Chromophore 설계와 같은 발견 작업에서 인간–AI 협업을 선보인다.
전문 화학자 평가를 바탕으로 ChemCrow를 순수한 LLM(GPT-4)와 비교 평가한다.
LLM 주도 화학에서 안전 및 위험 완화 전략을 제시한다.

제안 방법

설명된 도구 세트와 명시적 Thought–Action 입력 루프(ReAct/MRKL과 유사)를 가진 LLM(GPT-4)을 호출하여 도구 사용 및 입력 값을 결정한다.
LangChain을 통해 18개의 도메인별 화학 도구(웹/문헌 검색, 분자/반응 도구, 안전 점검)를 통합한다.
합성 및 검증을 위한 클라우드 연결 플랫폼(예: IBM RoboRXN)에서의 자율 실행을 가능하게 한다.
작업이 완료될 때까지 반복적 도구 질의 및 관찰로 행동을 정제한다.
전문 화학자 및 평가자 LLM(EvaluatorGPT)과 함께 GPT-4 기준선으로 성능을 평가한다.
unsafe한 권고를 방지하기 위한 안전 지침 및 위험 완화 전략을 강조한다.

실험 결과

연구 질문

RQ1LLM-기반 화학 에이전트가 실험실 환경에서 다단계 합성을 자율적으로 계획하고 실행할 수 있는가?
RQ2도메인별 도구를 통합하면 도구 없이 작동하는 LLM에 비해 화학적 사실성, 추론 품질 및 작업 완료가 향상되는가?
RQ3인간–AI 협업이 요구되는 발견 작업(예: 새로운 chromophore 설계)에서 ChemCrow의 성능은 어떠한가?
RQ4LLM 기반 화학에서 나타나는 안전, 윤리 및 IP 고려 사항은 무엇이며 이를 어떻게 완화할 수 있는가?

주요 결과

ChemCrow는 RoboRXN 플랫폼을 사용하여 DEET(살충제) 및 세 가지 thiourea 유기 촉매의 합성을 자율적으로 계획하고 실행했다.
인간–AI 협업을 통해 광흡수 최대가 336 nm 근처인 새로운 chromophore를 발견하고, 이후 합성 및 특성 분석을 수행했다.
ChemCrow는 전문가 화학자들의 평가에 따라 점점 복잡해지는 작업에서 도구 없는 GPT-4 대비 화학적 사실성, 추론, 완전성 측면에서 우수한 성능을 보였다.
GPT-4만 사용했을 때는 알려진 화합물(예: 파라세타몰)과 같은 기억 중심 작업 및 유창한 문장 표현에서 강점을 보였으나 새로운 화학 추론에는 어려움을 보였다.
chromophore 선별을 안내하기 위한 랜덤 포레스트 모델은 흡수 예측에서 RMSE 37 nm의 성능을 보였다.
이 연구는 LLM 기반 화학 엔진의 평가 방법, 도구 품질, 안전/IP 고려의 중요성을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.