QUICK REVIEW

[论文解读] TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools

Shanghua Gao, Richard Zhu|ArXiv.org|Mar 14, 2025

Machine Learning in Healthcare被引用 5

一句话总结

TxAgent 是一个 AI 代理，使用多步推理和实时工具集成，覆盖 211 种生物医学工具，提供有证据支撑、个性化的治疗建议。它在新药-治疗基准上超过大型 LLMs 和现有工具使用模型。

ABSTRACT

Precision therapeutics require multimodal adaptive models that generate personalized treatment recommendations. We introduce TxAgent, an AI agent that leverages multi-step reasoning and real-time biomedical knowledge retrieval across a toolbox of 211 tools to analyze drug interactions, contraindications, and patient-specific treatment strategies. TxAgent evaluates how drugs interact at molecular, pharmacokinetic, and clinical levels, identifies contraindications based on patient comorbidities and concurrent medications, and tailors treatment strategies to individual patient characteristics. It retrieves and synthesizes evidence from multiple biomedical sources, assesses interactions between drugs and patient conditions, and refines treatment recommendations through iterative reasoning. It selects tools based on task objectives and executes structured function calls to solve therapeutic tasks that require clinical reasoning and cross-source validation. The ToolUniverse consolidates 211 tools from trusted sources, including all US FDA-approved drugs since 1939 and validated clinical insights from Open Targets. TxAgent outperforms leading LLMs, tool-use models, and reasoning agents across five new benchmarks: DrugPC, BrandPC, GenericPC, TreatmentPC, and DescriptionPC, covering 3,168 drug reasoning tasks and 456 personalized treatment scenarios. It achieves 92.1% accuracy in open-ended drug reasoning tasks, surpassing GPT-4o and outperforming DeepSeek-R1 (671B) in structured multi-step reasoning. TxAgent generalizes across drug name variants and descriptions. By integrating multi-step inference, real-time knowledge grounding, and tool-assisted decision-making, TxAgent ensures that treatment recommendations align with established clinical guidelines and real-world evidence, reducing the risk of adverse events and improving therapeutic decision-making.

研究动机与目标

通过解决需要多模态、数据支撑的治疗推理并考虑患者特异因素来推动精准治疗。
开发一个将多步推理与实时生物医学工具检索相结合的 AI 代理，以评估药物相互作用、禁忌症和指南。
创建可扩展的 ToolUniverse 工具库和学习框架，以实现动态工具选择和基于证据的建议。
证明工具增强推理在开放式和结构化药物推理任务中可以超越更大的模型。
提供基准测试和分析，以评估在药品名称变体、基于描述的参考和个性化治疗情景中的泛化能力。

提出的方法

介绍由 ToolUniverse（211 个工具）、用于多步推理与工具执行的微调 LLM，以及 ToolRAG 自适应工具检索模型组成的 TxAgent 架构。
构建 ToolGen 将 API 文档转换为 ToolUniverse 的标准化工具规格。
使用 QuestionGen 和 TraceGen 流水线，从三个来源（工具化、治疗性问题、推理轨迹）开发 TxAgent-Instruct 数据集（378,027 条指令微调样本）。
通过对外部来源执行函数调用（如 FDA、Open Targets）实现实时知识 grounding，而不是依赖静态模型知识。
提供与最终答案并存的透明推理轨迹以支持验证和信任。
在五个基准上进行评估（DrugPC、BrandPC、GenericPC、DescriptionPC、TreatmentPC），覆盖 3,168 个药物推理任务和 456 个个性化治疗情景。

实验结果

研究问题

RQ1AI 代理如何通过整合大量外部生物医学工具箱，有效地执行多步骤治疗性推理？
RQ2与仅使用 LLM 的方法相比，来自经过验证的来源的实时 grounding 是否能提高药物推理的准确性并减少幻觉？
RQ3自适应工具检索（ToolRAG）和结构化推理轨迹是否能在开放式和选择题格式中提升比更大模型和现有工具使用 LLM 的表现？
RQ4TxAgent 对药品名称变体（品牌/通用/描述）和描述性药物叙述的鲁棒性如何？
RQ5多步训练轨迹和迭代工具使用是否显著提升个性化治疗推荐的性能？

主要发现

TxAgent 在开放式 DrugPC 药物推理任务中达到 92.1% 的准确率，较 GPT-4o 提升高达 25.8%，并在结构化多步推理方面超过 DeepSeek-R1。
TxAgent 在 DrugPC 的多选题上达到 93.8% 的准确率，在开放式 DrugPC 上达到 92.1%，在各项任务中超过 Llama-3.1-70B-Instruct 等基线。
在 BrandPC 与 GenericPC 上，TxAgent 分别获得 93.6% 和 93.7% 的准确率，显著高于纯 LLMs 和工具使用基线，且准确率方差较低（< 0.01）。
TxAgent 在 TreatmentPC 上实现 86.8% 的多选和 75.0% 的开放式正确率，超过 GPT-4o 和 Llama-3.1-70B-Instruct，并在工具使用 LLM 上取得大幅领先。
TxAgent 对药品名称表示具有鲁棒性，品牌/通用/描述引用之间的方差显著低于基线（引用了方差度量）。
消融研究表明增大 ToolUniverse 提高性能，显式推理步骤提升结果，实际工具使用优于以工具替代的 LLM；多步训练轨迹显著提升复杂推理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。