QUICK REVIEW

[论文解读] Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials

Yizhen Zheng, Huan Yee Koh|arXiv (Cornell University)|Sep 6, 2024

Computational Drug Discovery Methods被引用 12

一句话总结

这篇论文综述大型语言模型（LLMs）如何在疾病机制理解、药物发现和临床试验中整合，概述范式、进展和未来方向。

ABSTRACT

The integration of Large Language Models (LLMs) into the drug discovery and development field marks a significant paradigm shift, offering novel methodologies for understanding disease mechanisms, facilitating drug discovery, and optimizing clinical trial processes. This review highlights the expanding role of LLMs in revolutionizing various stages of the drug development pipeline. We investigate how these advanced computational models can uncover target-disease linkage, interpret complex biomedical data, enhance drug molecule design, predict drug efficacy and safety profiles, and facilitate clinical trial processes. Our paper aims to provide a comprehensive overview for researchers and practitioners in computational biology, pharmacology, and AI4Science by offering insights into the potential transformative impact of LLMs on drug discovery and development.

研究动机与目标

在药物发现与开发中定义两大LLM范式（专用与通用）。
概述一个三阶段的药物开发管线（Understanding Disease Mechanisms, Drug Discovery, Clinical Trials）并将LLM的能力映射到每个阶段。
Assess current maturity of LLM applications across stages (not applicable, nascent, advanced, mature).
Identify future directions and ethical, privacy, fairness, and bias considerations in LLM deployment.

提出的方法

Classify LLM types and categorize the drug development pipeline into three stages with corresponding tasks LLMs can perform.
Evaluate applications of LLMs across stages and assign maturity levels (not applicable, nascent, advanced, mature) as illustrated in Figure 6.
Synthesize existing literature on specialized nucleotide LLMs, transcriptomic LLMs, and protein-target analysis to discuss capabilities and limitations.
Discuss technical challenges (hallucinations, context window, interpretability) and propose directions for trustworthy deployment.

实验结果

研究问题

RQ1如何将 LLMs 有效整合到药物发现与开发的每个阶段？
RQ2LLMs 在支持疾病理解、发现与临床试验等下游任务方面有多么先进？
RQ3在药物开发中，LLMs 的未来方向、挑战以及伦理考量有哪些？

主要发现

LLMs 存在两大范式：在科学语言上训练的专用 LLMs 和在广泛文本上训练的通用 LLMs。
LLMs 可以通过文献综述、靶点-疾病关联分析和靶点验证来帮助疾病机制的理解。
在药物发现中，专门 LLMs 协助化学任务、ADMET 预测、逆合成和分子生成/编辑；通用 LLMs 实现更广泛的推理和工作流协助。
在临床试验中，LLMs 可以支持患者-试验匹配、试验规划、结果预测和文档生成。
基因组学与转录组学应用包括 nucleotide LLMs（如 DNA-BERT）、基因网络分析（Geneformer），以及转录组学的稀疏数据适应。
蛋白质-靶点分析利用 LLMs 进行进化保守性、蛋白质折叠洞见、结合位点预测，以及通过 ESM、AlphaFold2 和 RosettaFold 等模型进行结构预测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。