QUICK REVIEW

[论文解读] Deep Reinforcement Learning for Cost-Effective Medical Diagnosis

Yu Zheng, Yikuan Li|arXiv (Cornell University)|Feb 20, 2023

Machine Learning in Healthcare被引用 13

一句话总结

该论文开发了 SM-DDPO，一种半模型驱动的深度强化学习框架，用于学习成本感知、帕累托最优的动态实验室检查面板策略，在不平衡的医学数据中最大化 F1 并降低检测成本。它提供奖励塑形对偶性以获得帕累托前沿，并在 ferritin 异常、AKI 和败血症任务中展示了在成本显著降低的同时的最先进性能。

ABSTRACT

Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the $F_1$ score instead of the error rate. However, optimizing the non-concave $F_1$ score is not a classic RL problem, thus invalidates standard RL methods. To remedy this issue, we develop a reward shaping approach, leveraging properties of the $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on diverse clinical tasks: ferritin abnormality detection, sepsis mortality prediction, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identifies all Pareto-front solutions. Across all tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to $85\%$ reduction in testing cost. The code is available at [https://github.com/Zheng321/Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis].

研究动机与目标

通过动态选择实验室测试面板来促进成本效益的医疗诊断。
直接优化 F1 分数以应对不平衡的临床数据。
描述并计算诊断中的成本与准确度的帕累托前沿。
构建一个可扩展的端到端可训练框架，兼容在线学习。

提出的方法

将动态诊断公式化为一个多目标 MDP，最大化 F1 并最小化成本。
使用奖励塑形和极小极大对偶性将 F1 优化转换为基于奖励、可处理的问题。
引入 SM-DDPO，包含三个模块：后验状态编码器（基于 EMFlow 的插补器）、用于奖励近似的分类器，以及用于选择动作的面板选择器。
采用半模型化训练，交替更新：面板选择器的端到端 RL 和分类器的监督更新。
实现端到端训练，使其能够对新患者和疾病进行在线适应。

实验结果

研究问题

RQ1我们是否可以在 RL 中直接优化 F1 以应对不平衡的医学数据？
RQ2如何表征并学习动态诊断策略中成本与准确度的帕累托前沿？
RQ3半模型化方法在动态测试选择策略的可扩展端到端训练中是否有效？
RQ4与静态或随机策略相比，动态测试选择策略是否实现更优的准确性-成本权衡？

主要发现

模型	Ferritin F1	Ferritin AUROC	Ferritin 成本	AKI F1	AKI AUROC	AKI 成本	Sepsis F1	Sepsis AUROC	Sepsis 成本	策略
SM-DDPO_end2end	0.624	0.928	62	0.495	0.795	97	0.562	0.845	90	Dynamic
SM-DDPO_pretrained	0.607	0.925	80	0.519	0.789	90	0.567	0.836	85	Dynamic

SM-DDPO 在 ferritin、AKI 和败血症任务上实现了最先进或具有竞争力的 F1 和 AUROC，同时显著降低检测成本。
在 Sepsis 上，SM-DDPO_end2end 实现 F1 0.562 和 AUROC 0.845，成本最多降低 84%。
在 Ferritin 上，SM-DDPO_end2end 实现 F1 0.624 和 AUROC 0.928，成本为 62 单位，相较于基线成本更高。
在 AKI 上，SM-DDPO_end2end 达到 F1 0.495 和 AUROC 0.795，成本 97，显著低于完整观测成本。
该方法可以计算成本-F1 权衡的帕累托前沿，并支持端到端在线学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。