[論文レビュー] MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization
MedAD-38K is introduced as a large-scale multimodal MedAD benchmark with CoT annotations, and MedAD-R1 uses a two-stage training with Consistency GRPO to achieve state-of-the-art results with a 3B model by generating consistent reasoning and diagnoses.
Medical Anomaly Detection (MedAD) presents a significant opportunity to enhance diagnostic accuracy using Large Multimodal Models (LMMs) to interpret and answer questions based on medical images. However, the reliance on Supervised Fine-Tuning (SFT) on simplistic and fragmented datasets has hindered the development of models capable of plausible reasoning and robust multimodal generalization. To overcome this, we introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs. On this foundation, we propose a two-stage training framework. The first stage, Cognitive Injection, uses SFT to instill foundational medical knowledge and align the model with a structured think-then-answer paradigm. Given that standard policy optimization can produce reasoning that is disconnected from the final answer, the second stage incorporates Consistency Group Relative Policy Optimization (Con-GRPO). This novel algorithm incorporates a crucial consistency reward to ensure the generated reasoning process is relevant and logically coherent with the final diagnosis. Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10\%. This superior performance stems from its ability to generate transparent and logically consistent reasoning pathways, offering a promising approach to enhancing the trustworthiness and interpretability of AI for clinical decision support.
研究の動機と目的
- Address data fragmentation and reasoning gaps in medical anomaly detection (MedAD).
- Provide a large-scale, multimodal MedAD benchmark with Chain-of-Thought annotations.
- Develop a two-stage training framework to inject medical knowledge and reinforce consistent reasoning.
- Propose Consistency Group Relative Policy Optimization (Con-GRPO) to align reasoning with final diagnoses.
- Demonstrate state-of-the-art performance and interpretability on MedAD-38K.
提案手法
- Create MedAD-38K, a 10-modal, 10-region multimodal MedAD benchmark with VQA and CoT annotations.
- Two-stage training: Cognitive Injection via supervised fine-tuning (SFT) followed by Reasoning Reinforcement with Con-GRPO.
- Con-GRPO uses a group-relative policy optimization with a consistency reward to align thoughts with answers.
- Reward components: format correctness, answer accuracy, and reasoning-consistency (equal weights).
- Policy updates rely on Group Relative Advantage without a separate value network (GRPO-based).
- Evaluation conducted on a 3B backbone showing SOTA accuracy on MedAD-38K.
実験結果
リサーチクエスチョン
- RQ1Can MedAD-R1 outperform state-of-the-art medical and general LMMs on MedAD-38K?
- RQ2Does the two-stage training (Cognitive Injection + Con-GRPO) improve diagnostic accuracy and reasoning quality over SFT or RL alone?
- RQ3How does enforcing consistency between reasoning and final diagnosis affect trustworthiness and performance in MedAD?
主な発見
| Model | Params | Anatomy Identification | Anomaly Detection | Lesion Localization | Modality Classification | Pathology Characterization | Overall |
|---|---|---|---|---|---|---|---|
| MedAD-R1 (Ours) | 3B | 98.87 ± 0.35 | 78.24 ± 0.84 | 55.90 ± 1.02 | 97.14 ± 0.76 | 79.49 ± 1.20 | 85.15 ± 0.95 |
| Grok4-Fast | / | 93.47 ± 0.61 | 59.94 ± 1.56 | 24.90 ± 1.27 | 92.09 ± 2.38 | 39.49 ± 1.05 | 77.00 ± 1.00 |
| HuatuoGPT-Vision* | 7B | 95.65 ± 0.36 | 55.78 ± 1.38 | 33.79 ± 4.23 | 95.08 ± 0.16 | 59.58 ± 1.76 | 75.56 ± 0.51 |
- MedAD-R1 achieves Overall accuracy of 85.15% on MedAD-38K, outperforming the best baseline Grok4-Fast by 8.15 percentage points.
- MedAD-R1 reaches 98.87% in Anatomy Identification, 78.24% in Anomaly Detection, 55.90% in Lesion Localization, 97.14% in Modality Classification, and 79.49% in Pathology Characterization.
- Compared with a 3B backbone baseline (Qwen2.5-VL-3B), MedAD-R1 improves by 13.74%, demonstrating efficiency gains from the Con-GRPO framework.
- Ablation studies show RL-only underperforms (73.22%), SFT-only is strong (75.41%), and Consistency-focused rewards yield substantial gains (84.21%); balanced rewards achieve the best (85.15%).
- MedAD-R1 delivers transparent and logically coherent diagnostic reasoning, addressing trustworthiness gaps in medical AI.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。