QUICK REVIEW

[論文レビュー] Dementia-R1: Reinforced Pretraining and Reasoning from Unstructured Clinical Notes for Real-World Dementia Prognosis

Choonghan Kim, Hyunmin Hwang|arXiv (Cornell University)|Jan 6, 2026

Machine Learning in Healthcare被引用数 0

ひとこと要約

Dementia-R1 は検証可能な報酬を用いた二段階強化学習フレームワークを用い、未構造臨床ノートから長期的推論を学習。実世界の AMC データで認知症予後推定性能を強力に発揮し、7B モデルで ADNI ベンチマークにおいて競争力のある結果を達成。

ABSTRACT

While Large Language Models (LLMs) have shown strong performance on clinical text understanding, they struggle with longitudinal prediction tasks such as dementia prognosis, which require reasoning over complex, non-monotonic symptom trajectories across multiple visits. Standard supervised training lacks explicit annotations for symptom evolution, while direct Reinforcement Learning (RL) is hindered by sparse binary rewards. To address this challenge, we introduce Dementia-R1, an RL-based framework for longitudinal dementia prognosis from unstructured clinical notes. Our approach adopts a Cold-Start RL strategy that pre-trains the model to predict verifiable clinical indices extracted from patient histories, enhancing the capability to reason about disease progression before determining the final clinical status. Extensive experiments show that Dementia-R1 achieves the best overall performance on the AMC real-world unstructured cohort, reaching an AUROC of 84.02% and outperforming models up to 10x larger. The framework also generalizes to Parkinson's disease dementia prediction in an independent hospital cohort, achieving an AUROC of 78.37%. On the ADNI benchmark, our 7B model attains the highest AUROC among all LLM baselines at 83.17%, demonstrating strong longitudinal reasoning over fluctuating cognitive trajectories. Code is available at https://anonymous.4open.science/r/dementiar1-CDB5.

研究の動機と目的

未構造的な臨床記述からの正確な長期的認知症予後推定を、単一訪問のスナップショットではなく目標にする。
症状経時推移に対する明示的な時間的推論を可能にする二段階の強化学習フレームワークを開発。
検証可能な臨床指標を中間報酬として用いる Cold-Start RL アプローチで、最終予後を稀少な報酬問題として扱う。
実世界の未構造データ（AMC）と構造化ベンチマーク（ADNI）での一般化可能性を示す。
7B モデルが推論と長期的な意味付けの RL によって、より大規模ベースラインに匹敵または凌駕できることを示す。

提案手法

ノートから臨床指標（MMSE, GDS, CDR）を抽出する補助抽出機を用いて検証可能な事前学習データを構築。
Stage 1 Cold-Start Pre-training では GRPO（Group Relative Policy Optimization）を用いて抽出スコアを許容誤差を持つ検証可能報酬 R_cold で予測。
GRPO の目的をクリップされた代理損失とグループベースの優位性 A_i で安定性を確保して定義。
Stage 2 Task Fine-tuning は、疎な報酬 R_task（正/誤）を用いて最終的な二値認知症予後を最適化。
目標アンカー（最終訪問）を中心とした統一的な縦断サンプル構築、アンカー前の履歴、時間的感度をモデル化する離散化ギャップバケット。
AMC の未構造ノートと ADNI の構造化データ（線形化ログ経由）で評価し、ドメイン横断の一般化可能性を示す。

Figure 1: Multi-dimensional Performance Profile. Dementia-R1 demonstrates a consistent and balanced performance gain across all dimensions, including intermediate clinical reasoning tasks (e.g., MMSE, CDR-SB, ADAS-Cog) and the final dementia prognosis (F1-score)

実験結果

リサーチクエスチョン

RQ1未構造臨床ノートから明示的な長期推論を学習して認知症予後を予測する強化学習フレームワークは機能するか？
RQ2検証可能な中間臨床スコアで事前学習を行うと、直接的な監視学習や標準的な RL 微調整より最終予後の精度は改善されるか？
RQ3実世界の未構造データから構造化ベンチマークへの一般化はどの程度達成されるか？
RQ47B モデルは多次元の認知症関連推論タスクでより大規模モデルに対して競争力を持つか？
RQ5ノートから臨床的に意味のある中間スコア（MMSE, GDS, CDR）を推測する能力はあるか？

主な発見

Model	Size	Accuracy	Precision	Recall	F1 score
HuatuoGPT-o1	8B	67.19 ± 1.3	71.55 ± 1.5	58.99 ± 1.6	64.67 ± 1.5
Qwen2.5-7B-Inst	7B	71.94 ± 0.8	72.82 ± 0.7	71.60 ± 1.1	72.20 ± 0.8
Qwen2.5-32B-Inst	32B	61.99 ± 0.7	57.65 ± 0.4	95.46 ± 0.7	71.89 ± 0.4
SFT w/o Stage 1	7B	74.01 ± 1.0	72.21 ± 1.0	79.58 ± 1.0	75.72 ± 0.9
GRPO w/o Stage 1	7B	74.10 ± 0.9	70.96 ± 0.9	83.15 ± 1.0	76.57 ± 0.8
SFT w/o Stage 2	7B	65.60 ± 0.8	61.55 ± 0.6	86.43 ± 1.1	71.90 ± 0.6
GRPO w/o Stage 2	7B	72.47 ± 0.9	70.28 ± 0.8	79.58 ± 0.9	74.64 ± 0.8
SFT → SFT	7B	75.14 ± 0.6	73.43 ± 0.6	80.21 ± 0.6	76.67 ± 0.6
SFT → GRPO	7B	73.26 ± 0.6	70.39 ± 1.4	82.24 ± 0.8	75.85 ± 0.8
Dementia-R1	7B	74.93 ± 1.4	72.19 ± 1.7	82.56 ± 1.1	77.03 ± 0.7

Dementia-R1 は AMC の未構造データセットで F1 = 77.03% を達成し、Stage 1 なしの GRPO を含む複数のベースラインを上回る。
Stage 1 の検証可能報酬による事前学習は訓練を安定化させ、最終的な F1 を純粋な監視学習または単一段階 RL ベースラインより向上させる。
ADNI ベンチマークでは Dementia-R1（7B）は F1 = 74.91% を達成し、GPT-4o に近づき、いくつかの指標でより大規模モデルに対抗。
臨床指標予測が改善され、Dementia-R1 は平均指標予測精度（59.6%）で最も高く、GDS と CDR 指標で一部の 32B ベースラインを上回る。
神経科医評価では Dementia-R1 がより臨床的根拠に基づく推論軌跡を提供し、総合臨床有用性と時間推論性能で 32B ベースラインに対して強力。
アブレーション研究は Stage 1 の利点を確認：Stage 1 ありのモデルの方が早期収束と最終的な F1 が高い。

Figure 2: Overview of the Dementia-R1 Framework. The pipeline consists of two phases: Stage 1: Cold-Start Pre-training , where the base model learns longitudinal reasoning via GRPO on forecasting tasks; and Stage 2: Task Fine-tuning , where the reasoning-aligned model is adapted for the final dement

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。