Skip to main content
QUICK REVIEW

[論文レビュー] MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization

Haitao Zhang, Yingying Wang|arXiv (Cornell University)|Feb 1, 2026
Anomaly Detection Techniques and Applications被引用数 0
ひとこと要約

MedAD-38K is introduced as a large-scale multimodal MedAD benchmark with CoT annotations, and MedAD-R1 uses a two-stage training with Consistency GRPO to achieve state-of-the-art results with a 3B model by generating consistent reasoning and diagnoses.

ABSTRACT

Medical Anomaly Detection (MedAD) presents a significant opportunity to enhance diagnostic accuracy using Large Multimodal Models (LMMs) to interpret and answer questions based on medical images. However, the reliance on Supervised Fine-Tuning (SFT) on simplistic and fragmented datasets has hindered the development of models capable of plausible reasoning and robust multimodal generalization. To overcome this, we introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs. On this foundation, we propose a two-stage training framework. The first stage, Cognitive Injection, uses SFT to instill foundational medical knowledge and align the model with a structured think-then-answer paradigm. Given that standard policy optimization can produce reasoning that is disconnected from the final answer, the second stage incorporates Consistency Group Relative Policy Optimization (Con-GRPO). This novel algorithm incorporates a crucial consistency reward to ensure the generated reasoning process is relevant and logically coherent with the final diagnosis. Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10\%. This superior performance stems from its ability to generate transparent and logically consistent reasoning pathways, offering a promising approach to enhancing the trustworthiness and interpretability of AI for clinical decision support.

研究の動機と目的

  • Address data fragmentation and reasoning gaps in medical anomaly detection (MedAD).
  • Provide a large-scale, multimodal MedAD benchmark with Chain-of-Thought annotations.
  • Develop a two-stage training framework to inject medical knowledge and reinforce consistent reasoning.
  • Propose Consistency Group Relative Policy Optimization (Con-GRPO) to align reasoning with final diagnoses.
  • Demonstrate state-of-the-art performance and interpretability on MedAD-38K.

提案手法

  • Create MedAD-38K, a 10-modal, 10-region multimodal MedAD benchmark with VQA and CoT annotations.
  • Two-stage training: Cognitive Injection via supervised fine-tuning (SFT) followed by Reasoning Reinforcement with Con-GRPO.
  • Con-GRPO uses a group-relative policy optimization with a consistency reward to align thoughts with answers.
  • Reward components: format correctness, answer accuracy, and reasoning-consistency (equal weights).
  • Policy updates rely on Group Relative Advantage without a separate value network (GRPO-based).
  • Evaluation conducted on a 3B backbone showing SOTA accuracy on MedAD-38K.

実験結果

リサーチクエスチョン

  • RQ1Can MedAD-R1 outperform state-of-the-art medical and general LMMs on MedAD-38K?
  • RQ2Does the two-stage training (Cognitive Injection + Con-GRPO) improve diagnostic accuracy and reasoning quality over SFT or RL alone?
  • RQ3How does enforcing consistency between reasoning and final diagnosis affect trustworthiness and performance in MedAD?

主な発見

ModelParamsAnatomy IdentificationAnomaly DetectionLesion LocalizationModality ClassificationPathology CharacterizationOverall
MedAD-R1 (Ours)3B98.87 ± 0.3578.24 ± 0.8455.90 ± 1.0297.14 ± 0.7679.49 ± 1.2085.15 ± 0.95
Grok4-Fast/93.47 ± 0.6159.94 ± 1.5624.90 ± 1.2792.09 ± 2.3839.49 ± 1.0577.00 ± 1.00
HuatuoGPT-Vision*7B95.65 ± 0.3655.78 ± 1.3833.79 ± 4.2395.08 ± 0.1659.58 ± 1.7675.56 ± 0.51
  • MedAD-R1 achieves Overall accuracy of 85.15% on MedAD-38K, outperforming the best baseline Grok4-Fast by 8.15 percentage points.
  • MedAD-R1 reaches 98.87% in Anatomy Identification, 78.24% in Anomaly Detection, 55.90% in Lesion Localization, 97.14% in Modality Classification, and 79.49% in Pathology Characterization.
  • Compared with a 3B backbone baseline (Qwen2.5-VL-3B), MedAD-R1 improves by 13.74%, demonstrating efficiency gains from the Con-GRPO framework.
  • Ablation studies show RL-only underperforms (73.22%), SFT-only is strong (75.41%), and Consistency-focused rewards yield substantial gains (84.21%); balanced rewards achieve the best (85.15%).
  • MedAD-R1 delivers transparent and logically coherent diagnostic reasoning, addressing trustworthiness gaps in medical AI.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。