Skip to main content
QUICK REVIEW

[论文解读] TimeTox: An LLM-Based Pipeline for Automated Extraction of Time Toxicity from Clinical Trial Protocols

Saketh Vinjamuri, Marielle Fis Loperena|arXiv (Cornell University)|Mar 22, 2026
Machine Learning in Healthcare被引用 0
一句话总结

TimeTox 开发了一个端到端的基于LLM的管道,用于自动从临床试验方案的 Schedule of Assessments 表中提取时间毒性,比较 vanilla 与两阶段架构,并在644份真实世界肿瘤学方案上进行验证。

ABSTRACT

Time toxicity, the cumulative healthcare contact days from clinical trial participation, is an important but labor-intensive metric to extract from protocol documents. We developed TimeTox, an LLM-based pipeline for automated extraction of time toxicity from Schedule of Assessments tables. TimeTox uses Google's Gemini models in three stages: summary extraction from full-length protocol PDFs, time toxicity quantification at six cumulative timepoints for each treatment arm, and multi-run consensus via position-based arm matching. We validated against 20 synthetic schedules (240 comparisons) and assessed reproducibility on 644 real-world oncology protocols. Two architectures were compared: single-pass (vanilla) and two-stage (structure-then-count). The two-stage pipeline achieved 100% clinically acceptable accuracy ($\pm$3 days) on synthetic data (MAE 0.81 days) versus 41.5% for vanilla (MAE 9.0 days). However, on real-world protocols, the vanilla pipeline showed superior reproducibility: 95.3% clinically acceptable accuracy (IQR $\leq$ 3 days) across 3 runs on 644 protocols, with 82.0% perfect stability (IQR = 0). The production pipeline extracted time toxicity for 1,288 treatment arms across multiple disease sites. Extraction stability on real-world data, rather than accuracy on synthetic benchmarks, is the decisive factor for production LLM deployment.

研究动机与目标

  • 需要量化协议文档中的患者时间负担(time toxicity)的动机。
  • 开发一个使用 Gemini 模型的端到端管道,以从 SoA 表中提取并计算时间毒性。
  • 比较单次传递(vanilla)与两阶段(结构-再计数)提取架构。
  • 通过多次运行共识与真实世界协议部署评估生产可行性。

提出的方法

  • 使用 Google Gemini 模型对完整协议 PDF 进行摘要提取。
  • 实现两种提取架构:vanilla 单次传递与两阶段结构-再计数。
  • 应用基于位置的多次运行共识,以缓解跨运行的臂名不稳定性。
  • 在 20 个人工合成的时间表上训练并验证,具有真实地面-truth 的时间毒性值。
  • 处理 644 份真实世界肿瘤学方案,以演示生产可行性。
  • 提供开源代码和合成地面真值生成器。
Figure 1: Representative SoA table from a complex synthetic breast oncology protocol (BRST-2025-01) showing two treatment arms with three visit days per cycle.
Figure 1: Representative SoA table from a complex synthetic breast oncology protocol (BRST-2025-01) showing two treatment arms with three visit days per cycle.

实验结果

研究问题

  • RQ1基于LLM的管道是否能够从 Schedule of Assessments 表中准确量化时间毒性?
  • RQ2哪种架构(vanilla vs 两阶段)在合成数据与真实世界数据中具有更高的准确性和稳定性?
  • RQ3多次运行共识是否能提升对时间毒性提取的鲁棒性,降低运行间的变异?
  • RQ4在生产规模上,提取在时间、成本和跨方案的可重复性方面是否可行?

主要发现

  • 两阶段提取在合成数据上具有较高的准确性,但在真实世界的稳定性较差(MAE 0.81 天;精确匹配 0.3%;临床可接受性 100%)在 240 个合成对比中。
  • Vanilla 提取在合成数据上准确性适中,但在真实世界上稳定性强(此处未给出 MAE;临床可接受性在 644 份方案中达到 95.3%;完美稳定性 82.0%)。
  • 生产部署采用 vanilla,3 轮共识,覆盖 644 份方案,生成 1,288 条臂的时间毒性数据。
  • 处理时间:合成摘要每份方案 2–3 分钟;vanilla 提取约 4 分钟/份;644 份方案总计约 128 小时。
  • 时间毒性的开启源代码和地面真值生成器可在 TimeTox 的 GitHub 仓库获取。
Figure 2: Step-by-step pipeline for processing protocol PDFs via the Gemini API to extract relevant schedules and generate a consolidated summary document.
Figure 2: Step-by-step pipeline for processing protocol PDFs via the Gemini API to extract relevant schedules and generate a consolidated summary document.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。