Skip to main content
QUICK REVIEW

[论文解读] Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence

A I Vinogradova, В. М. Виноградов|arXiv (Cornell University)|Feb 16, 2026
Computational Drug Discovery Methods被引用 0
一句话总结

论文提出 Bioptic Agent,一种基于树结构的多语言AI系统,以及面向全球药物资产侦察的以完整性为先的基准测试,通过强调对多语言来源的全面、非幻觉式发现,在F1上超过若干商业基线。

ABSTRACT

Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels. Recent data suggests that over 85% of patent filings originate outside the U.S., with China accounting for nearly half of the global total. A growing share of scholarly output is also non-U.S. Industry estimates put China at 30% of global drug development, spanning 1,200+ novel candidates. In this high-stakes environment, failing to surface "under-the-radar" assets creates multi-billion-dollar risk for investors and business development teams, making asset scouting a coverage-critical competition where speed and completeness drive value. Yet today's Deep Research AI agents still lag human experts in achieving high recall discovery across heterogeneous, multilingual sources without hallucination. We propose a benchmarking methodology for drug asset scouting and a tuned, tree-based self-learning Bioptic Agent aimed at complete, non-hallucinated scouting. We construct a challenging completeness benchmark using a multilingual multi-agent pipeline: complex user queries paired with ground-truth assets that are largely outside U.S.-centric radar. To reflect real-deal complexity, we collected screening queries from expert investors, BD, and VC professionals and used them as priors to conditionally generate benchmark queries. For grading, we use LLM-as-judge evaluation calibrated to expert opinions. On this benchmark, our Bioptic Agent achieves 79.7% F1 score, outperforming Claude Opus 4.6 (56.2%), Gemini 3 Pro + Deep Research (50.6%), OpenAI GPT-5.2 Pro (46.6%), Perplexity Deep Research (44.2%), and Exa Websets (26.9%). Performance improves steeply with additional compute, supporting the view that more compute yields better results.

研究动机与目标

  • 为全球、非英语为主的创新环境中的 BD/S&E 需求,动机并形式化广覆盖、多语言的药物资产侦察。
  • 开发一种以完整性为导向的基准评测方法,减少方法本身带来的偏见,并评估开放世界资产发现。
  • 提出并评估 Bioptic Agent,这是一个树基自学习系统,优化于跨语言的非幻觉、完整资产发现。
  • 证明更高的计算资源与语言并行探索能提升广泛、较高召回率的资产识别。
  • 强调证据聚合、来源追溯与专家对齐的验证提升 BD 级侦察质量。

提出的方法

  • 构建一个多语言、区域分布的完整性基准,用于药物资产侦察,其真实资产来自非美国地区源。
  • 引入区域新闻挖掘器,从区域-语言-源的元组中收集资产并产生带有规范链接的候选资产。
  • 通过属性丰富化代理对挖掘的资产进行有效性验证、别名解决,并提取带有来源的最新属性信息。
  • 使用 Google Search Agent 测量英语与本地可发现性轮廓,以用于下游过滤。
  • 以真实 BD 查询种子语料为条件生成投资者本地化查询,以确保现实的意图与难度分布。
  • 实现 Criteria Match Validator Agent,对候选资产基于查询标准进行判断,提供结构化来源与属性。
  • 加入去重代理以解决别名问题,并维持全球唯一资产的存储。
  • 描述 Bioptic Agent 的树基架构,包含 Coach Agent、基于 UCB 的选择、语言并行性,以及跨时代的滚动评估与 recalls 的持续增长。
  • 证明更多计算资源带来更好结果,并且以完整性为导向的搜索控制和验证优于单纯增加浏览或综合。
Figure 1: Quality–time tradeoff for asset scouting. y-axis: F1-score (harmonic mean of precision and recall; higher is better). x-axis: wall-clock time (log scale; larger indicates longer compute). DR here stands for deep research; lang-free stands for no language parallelism.
Figure 1: Quality–time tradeoff for asset scouting. y-axis: F1-score (harmonic mean of precision and recall; higher is better). x-axis: wall-clock time (log scale; larger indicates longer compute). DR here stands for deep research; lang-free stands for no language parallelism.

实验结果

研究问题

  • RQ1一种以完整性为先、多语言的基准方法,是否能够可靠地揭示在英语为中心的来源中被低估的合格药物资产?
  • RQ2Bioptic Agent 在开放世界、多语言查询条件下,是否在资产发现方面实现比最先进的商业深度研究基线更高的完整性(F1)?
  • RQ3语言并行性和树基探索策略如何影响 BD/S&E 的召回率与精准度?
  • RQ4证据聚合、来源追溯与专家对齐的验证在多大程度上减少幻觉并改善任务特定约束的满足?

主要发现

  • Bioptic Agent 在完整性基准测试中达到 79.7% 的 F1,超过 Claude Opus 4.6(56.2%)、Gemini 3 Pro + Deep Research(50.6%)、OpenAI GPT-5.2 Pro(46.6%)、Perplexity Deep Research(44.2%)以及 Exa Websets(26.9%)。
  • 在额外计算资源下,性能呈显著提升,支持“更多计算资源带来更好结果”的主张。
  • 基准构建强调开放世界、多语言的全量资产发现,并对查询意图进行控制,减少英语中心偏见,突出被低估的资产。
  • 区域挖掘与多代理管线能够发现非英语区域资产,并以具有来源丰富、结构化属性记录的方式进行验证。
  • 管线使用实体不可知的查询模板和区域-语言-源约束,缓解 incumbency 偏见与对全球放大资产的过度强调。
  • 基于证据的验证器与树基自学习指令引导持续的 recall 增长与对约束的满足,超越简单的自我纠错回路。
Figure 2: Completeness Benchmark construction pipeline Top: Assets Mining the Regional News Miner Agent surfaces regional-stage drug assets from non-English sources; the Attributes Enrichment Agent validates and structures each asset; the Google Search Agent prioritizes under-the-radar assets via an
Figure 2: Completeness Benchmark construction pipeline Top: Assets Mining the Regional News Miner Agent surfaces regional-stage drug assets from non-English sources; the Attributes Enrichment Agent validates and structures each asset; the Google Search Agent prioritizes under-the-radar assets via an

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。