QUICK REVIEW

[论文解读] On Multi-Armed Bandit Designs for Dose-Finding Clinical Trials

Maryam Aziz, Emilie Kaufmann|arXiv (Cornell University)|Mar 17, 2019

Advanced Bandit Algorithms Research参考文献 46被引用 26

一句话总结

本文倡导在I/II期肿瘤临床试验中采用具有单调性感知先验的Thompson Sampling进行剂量发现，展示了在识别最优剂量的同时最小化对毒性或无效剂量暴露方面的优越性能。本文首次为具有均匀先验的Thompson Sampling变体提供了有限时间内的次优选择上界，模拟结果表明其显著优于当前最先进的算法。

ABSTRACT

We study the problem of finding the optimal dosage in early stage clinical trials through the multi-armed bandit lens. We advocate the use of the Thompson Sampling principle, a flexible algorithm that can accommodate different types of monotonicity assumptions on the toxicity and efficacy of the doses. For the simplest version of Thompson Sampling, based on a uniform prior distribution for each dose, we provide finite-time upper bounds on the number of sub-optimal dose selections, which is unprecedented for dose-finding algorithms. Through a large simulation study, we then show that variants of Thompson Sampling based on more sophisticated prior distributions outperform state-of-the-art dose identification algorithms in different types of dose-finding studies that occur in phase I or phase I/II trials.

研究动机与目标

为应对早期临床试验中的伦理与统计挑战，平衡治疗效益与探索需求。
开发一种统一的、自适应的剂量发现框架，以同时适应细胞毒性药物（I期）和分子靶向药物（I/II期）。
为在单调性假设下剂量发现中的Thompson Sampling提供有限时间的理论保证。
通过减少对次优剂量的分配，同时保持对最优剂量识别的高精度，改进现有算法。
探索结合患者层面信息的实用扩展，以实现个性化给药。

提出的方法

将Thompson Sampling作为贝叶斯多臂赌博机算法，根据观察到的毒性与疗效结果，依次分配剂量。
在理论分析中使用独立的均匀先验来估计毒性概率，随后扩展至能编码单调性的更具信息量的先验。
提出TS_A算法，该变体利用毒性与疗效中的单调性，减少对有害或无效剂量的分配。
采用有限时间分析，推导出在均匀先验下次优剂量选择次数的上界。
通过模拟研究，在多种试验场景下对比其与当前最先进的剂量发现算法的性能。
结合多臂赌博机中最佳臂识别（BAI）的理论洞见，特别是单调性约束下最优分配策略的作用。

实验结果

研究问题

RQ1具有单调性感知先验的Thompson Sampling能否在I期和I/II期试验中优于现有剂量发现算法，更有效地识别最优剂量？
RQ2在剂量发现赌博机问题中，能否为具有均匀先验的Thompson Sampling建立有限时间性能保证？
RQ3在毒性与疗效中引入单调性，如何影响自适应临床试验的分配效率与伦理结果？
RQ4Thompson Sampling能否扩展以处理上下文信息，实现在肿瘤试验中的个性化剂量选择？
RQ5先验分布的选择对剂量发现算法的收敛性与鲁棒性有何影响？

主要发现

本文首次为具有均匀先验的Thompson Sampling在剂量发现赌博机问题中，建立了次优剂量选择次数的有限时间上界。
模拟结果表明，采用信息性先验的Thompson Sampling变体在多种I期和I/II期试验设计中，显著优于基线算法，能更准确识别最优剂量。
TS_A算法减少了对高毒性剂量的分配，同时保持高识别精度，从而改善了临床试验的伦理结果。
理论分析表明，在单调性约束下，最优采样分配集中于接近最大耐受剂量（MTD）的剂量，支持有针对性的探索。
尽管在固定置信度BAI方面已取得进展，固定预算设置在最优分配方面仍具挑战，凸显该领域仍需进一步研究。
具有结构化先验的Thompson Sampling提供了一种灵活、理论坚实且实证表现优越的传统剂量发现方法的替代方案。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。