QUICK REVIEW

[論文レビュー] Deep Reinforcement Learning for Cost-Effective Medical Diagnosis

Yu Zheng, Yikuan Li|arXiv (Cornell University)|Feb 20, 2023

Machine Learning in Healthcare被引用数 13

ひとこと要約

本論文は、コスト認識型のパレート最適なダイナミック検査パネル方針を学習する半モデルベースの深層RLフレームワーク SM-DDPO を提案する。不均衡な医療データに対して F1 を最大化しつつ検査コストを削減する。報酬整形の二重性を提供してパレート前線を取得し、ferritin abnormality、AKI、sepsis タスクで最先端の性能と大幅なコスト削減を実証する。

ABSTRACT

Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the $F_1$ score instead of the error rate. However, optimizing the non-concave $F_1$ score is not a classic RL problem, thus invalidates standard RL methods. To remedy this issue, we develop a reward shaping approach, leveraging properties of the $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on diverse clinical tasks: ferritin abnormality detection, sepsis mortality prediction, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identifies all Pareto-front solutions. Across all tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to $85\%$ reduction in testing cost. The code is available at [https://github.com/Zheng321/Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis].

研究の動機と目的

ラボ検査パネルを動的に選択することで、費用対効果の高い医療診断を促進する。
不均衡な臨床データに対処するため、F1 スコアを直接最適化する。
診断におけるコストと精度のパレート前線を特徴づけ、計算する。
オンライン学習に適した、スケーラブルでエンドツーエンドで訓練可能なフレームワークを開発する。

提案手法

ダイナミック診断を F1 を最大化しコストを最小化する多目的MDPとして定式化する。
報酬整形とミニマックス二重性を用いて F1 の最適化を報酬ベースの扱いやすい問題に変換する。
三つのモジュールを備えた SM-DDPO を導入する：後方状態エンコーダ（EMFlowベースのインプテーター）、報酬近似の分類器、アクション選択のパネルセレクター。
交互更新による半モデルベース訓練を採用：パネルセレクターのエンドツーエンドRLと分類器の教師あり更新。
新規患者や疾患へのオンライン適応を可能にするエンドツーエンド訓練を実装。

実験結果

リサーチクエスチョン

RQ1不均衡な医療データに対して RL で直接 F1 を最適化できるか？
RQ2動的診断ポリシーにおけるコストと精度のパレート前線をいかに特徴付け、学習できるか？
RQ3動的検査選択ポリシーのスケーラブルなエンドツーエンド訓練に対して半モデルベースアプローチは有効か？
RQ4動的な検査選択ポリシーは static やランダム戦略と比較して精度とコストのトレードオフが優れているか？

主な発見

モデル	Ferritin F1	Ferritin AUROC	Ferritin Cost	AKI F1	AKI AUROC	AKI Cost	Sepsis F1	Sepsis AUROC	Sepsis Cost	Strategy
SM-DDPO_end2end	0.624	0.928	62	0.495	0.795	97	0.562	0.845	90	Dynamic
SM-DDPO_pretrained	0.607	0.925	80	0.519	0.789	90	0.567	0.836	85	Dynamic

SM-DDPO は ferritin、AKI、sepsis タスクで最先端または競争力のある F1 および AUROC を達成しつつ、検査コストを大幅に削減。
Sepsis では SM-DDPO_end2end が F1 0.562、AUROC 0.845 を達成し、コスト削減は最大で 84%。
Ferritin では SM-DDPO_end2end が F1 0.624、AUROC 0.928、62 コスト単位で、より高コストのベースラインと比較。
AKI では SM-DDPO_end2end が F1 0.495、AUROC 0.795、コスト 97、フル観察コストよりはるかに低い。
本手法はコスト-F1 のトレードオフのパレート前線を計算でき、エンドツーエンドのオンライン学習をサポートする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。