QUICK REVIEW

[論文レビュー] MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization

Haitao Zhang, Yingying Wang|arXiv (Cornell University)|Feb 1, 2026

Anomaly Detection Techniques and Applications被引用数 0

ひとこと要約

MedAD-38K is introduced as a large-scale multimodal MedAD benchmark with CoT annotations, and MedAD-R1 uses a two-stage training with Consistency GRPO to achieve state-of-the-art results with a 3B model by generating consistent reasoning and diagnoses.

ABSTRACT

Medical Anomaly Detection (MedAD) presents a significant opportunity to enhance diagnostic accuracy using Large Multimodal Models (LMMs) to interpret and answer questions based on medical images. However, the reliance on Supervised Fine-Tuning (SFT) on simplistic and fragmented datasets has hindered the development of models capable of plausible reasoning and robust multimodal generalization. To overcome this, we introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs. On this foundation, we propose a two-stage training framework. The first stage, Cognitive Injection, uses SFT to instill foundational medical knowledge and align the model with a structured think-then-answer paradigm. Given that standard policy optimization can produce reasoning that is disconnected from the final answer, the second stage incorporates Consistency Group Relative Policy Optimization (Con-GRPO). This novel algorithm incorporates a crucial consistency reward to ensure the generated reasoning process is relevant and logically coherent with the final diagnosis. Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10\%. This superior performance stems from its ability to generate transparent and logically consistent reasoning pathways, offering a promising approach to enhancing the trustworthiness and interpretability of AI for clinical decision support.

研究の動機と目的

Address data fragmentation and reasoning gaps in medical anomaly detection (MedAD).
Provide a large-scale, multimodal MedAD benchmark with Chain-of-Thought annotations.
Develop a two-stage training framework to inject medical knowledge and reinforce consistent reasoning.
Propose Consistency Group Relative Policy Optimization (Con-GRPO) to align reasoning with final diagnoses.
Demonstrate state-of-the-art performance and interpretability on MedAD-38K.

提案手法

Create MedAD-38K, a 10-modal, 10-region multimodal MedAD benchmark with VQA and CoT annotations.
Two-stage training: Cognitive Injection via supervised fine-tuning (SFT) followed by Reasoning Reinforcement with Con-GRPO.
Con-GRPO uses a group-relative policy optimization with a consistency reward to align thoughts with answers.
Reward components: format correctness, answer accuracy, and reasoning-consistency (equal weights).
Policy updates rely on Group Relative Advantage without a separate value network (GRPO-based).
Evaluation conducted on a 3B backbone showing SOTA accuracy on MedAD-38K.

実験結果

リサーチクエスチョン

RQ1Can MedAD-R1 outperform state-of-the-art medical and general LMMs on MedAD-38K?
RQ2Does the two-stage training (Cognitive Injection + Con-GRPO) improve diagnostic accuracy and reasoning quality over SFT or RL alone?
RQ3How does enforcing consistency between reasoning and final diagnosis affect trustworthiness and performance in MedAD?

主な発見

Model	Params	Anatomy Identification	Anomaly Detection	Lesion Localization	Modality Classification	Pathology Characterization	Overall
MedAD-R1 (Ours)	3B	98.87 ± 0.35	78.24 ± 0.84	55.90 ± 1.02	97.14 ± 0.76	79.49 ± 1.20	85.15 ± 0.95
Grok4-Fast	/	93.47 ± 0.61	59.94 ± 1.56	24.90 ± 1.27	92.09 ± 2.38	39.49 ± 1.05	77.00 ± 1.00
HuatuoGPT-Vision*	7B	95.65 ± 0.36	55.78 ± 1.38	33.79 ± 4.23	95.08 ± 0.16	59.58 ± 1.76	75.56 ± 0.51

MedAD-R1 achieves Overall accuracy of 85.15% on MedAD-38K, outperforming the best baseline Grok4-Fast by 8.15 percentage points.
MedAD-R1 reaches 98.87% in Anatomy Identification, 78.24% in Anomaly Detection, 55.90% in Lesion Localization, 97.14% in Modality Classification, and 79.49% in Pathology Characterization.
Compared with a 3B backbone baseline (Qwen2.5-VL-3B), MedAD-R1 improves by 13.74%, demonstrating efficiency gains from the Con-GRPO framework.
Ablation studies show RL-only underperforms (73.22%), SFT-only is strong (75.41%), and Consistency-focused rewards yield substantial gains (84.21%); balanced rewards achieve the best (85.15%).
MedAD-R1 delivers transparent and logically coherent diagnostic reasoning, addressing trustworthiness gaps in medical AI.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。