QUICK REVIEW

[논문 리뷰] DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration

Sizhe Liu, Yizhou Lu|arXiv (Cornell University)|2024. 11. 24.

Scientific Computing and Data Management인용 수 7

한 줄 요약

DrugAgent는 약물 발견에서 ML 프로그래밍을 자동화하기 위해 도메인 특화 도구와 다중 에이전트 LLM 프레임워크를 사용하며, 아이디어 공간 관리가 동적으로 이루어진다; 사례 연구에서 PAMPA 데이터의 ADMET 흡수 예측에 대해 F1 점수 0.92를 달성했다.

ABSTRACT

Recent advancements in Large Language Models (LLMs) have opened new avenues for accelerating drug discovery processes. Despite their potential, several critical challenges remain unsolved, particularly in translating theoretical ideas into practical applications within the highly specialized field of pharmaceutical research, limiting practitioners from leveraging the latest AI development in drug discovery. To this end, we introduce DrugAgent, a multi-agent framework aimed at automating machine learning (ML) programming in drug discovery. DrugAgent incorporates domain expertise by identifying specific requirements and building domain-specific tools, while systematically exploring different ideas to find effective solutions. A preliminary case study demonstrates DrugAgent's potential to overcome key limitations LLMs face in drug discovery, moving toward AI-driven innovation. For example, DrugAgent is able to complete the ML programming pipeline end-to-end, from data acquisition to performance evaluation for the ADMET prediction task, and finally select the best model, where the random forest model achieves an F1 score of 0.92 when predicting absorption using the PAMPA dataset.

연구 동기 및 목표

일반 목적 LLM 추론과 도메인 특화 약물 발견 요구 사이의 간극을 좁힌다.
데이터 수집에서 모델 평가까지 사람의 코딩 없이 약물 발견 분야의 ML 프로그래밍 작업을 자동화한다.
도메인 특화 도구와 아이디어 공간 관리 전략을 도입해 탐색 효율성을 향상시킨다.
ADMET 예측에서 엔드-투-엔드 자동화를 시연하고 일반 목적 프레임워크와 비교한다.

제안 방법

약물 발견 ML 프로그래밍을 위한 자동화된 LLM 기반 다중 에이전트 시스템을 도입한다.
도메인 지식 필요를 식별하고 도구를 준비하기 위해 LLM 강사를 통합한다.
생성과 가지치기를 통해 아이디어 공간을 관리하고 정제하기 위해 LLM 플래너를 활용한다.
데이터 수집, fingerprinting, 및 모델 평가를 위한 단위 테스트와 재사용 가능한 도구 상자를 포함한 도메인 특화 도구를 개발한다.
데이터 수집에서 모델 평가까지 엔드-투-엔드 파이프라인을 시연하고 최적의 모델을 선택한다.

Figure 1: Framework overview of DrugAgent. Given an AI-based drug discovery task described in natural language (i.e., user’s input, e.g., design an AI model to predict Absorption (one of the ADMET properties) using the PAMPA dataset (Siramshetty, Shah et al. 2021 ) , the LLM Planner first produces a

실험 결과

연구 질문

RQ1도메인 특화 지식을 어떻게 명시적으로 식별하고 LLM 구동 ML 프로그래밍에 통합할 수 있는가?
RQ2다중 에이전트 프레임워크가 비실행적이거나 하위 최적 아이디어를 체계적으로 탐색하고 제거하여 자동화 효율성을 향상시킬 수 있는가?
RQ3DrugAgent는 표준 AI 주도 약물 발견 과제(ADMET, DTI, 분자 최적화)에서 일반 목적 벤치마크와 비교하여 어떤 성능을 보이나?

주요 결과

DrugAgent는 PAMPA 데이터에서 ADMET 예측에 대한 엔드-투-엔드 ML 프로그래밍을 자동화할 수 있다.
랜덤 포레스트 모델은 PAMPA 흡수 예측에서 F1 = 0.92 및 ROC-AUC = 0.817를 달성했다.
ChemBERTa는 동일 작업에서 F1 = 0.916 및 ROC-AUC = 0.776을 달성했다.
DrugAgent는 도메인 지식과 도구 구성의 효과적 통합으로 인간의 개입 의존도를 감소시켜 일반 목적 프레임워크(ReAct)보다 우수한 성능을 보인다.
이 프레임워크는 아이디어 공간 관리와 도메인 도구 구성을 결합하여 비효율적인 접근 방식(예: 분자 그래프 구성)을 제거한다.

Figure 2: Comparison of ReAct (a) and DrugAgent (b) on an ADMET prediction task using the PAMPA dataset. ReAct, a general-purpose framework, fails due to hallucinated API calls and an inability to self-debug, requiring human intervention to proceed. It focuses solely on fine-tuning a pretrained lang

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.