QUICK REVIEW

[論文レビュー] CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

Yannian Gu, Xizhuo Zhang|arXiv (Cornell University)|Feb 23, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

CT-FlowはModel Context Protocol (MCP)サーバーを活用してツール使用を介した3D CT解釈を指向的・反復的な推論へと転換するエージェント的フレームワークを提示する。3D CTベンチマークと新しいCT-FlowBench軌跡ベースのベンチマークで最先端の結果を達成する。

ABSTRACT

Recent advances in Large Vision-Language Models (LVLMs) have shown strong potential for multi-modal radiological reasoning, particularly in tasks like diagnostic visual question answering (VQA) and radiology report generation. However, most existing approaches for 3D CT analysis largely rely on static, single-pass inference. In practice, clinical interpretation is a dynamic, tool-mediated workflow where radiologists iteratively review slices and use measurement, radiomics, and segmentation tools to refine findings. To bridge this gap, we propose CT-Flow, an agentic framework designed for interoperable volumetric interpretation. By leveraging the Model Context Protocol (MCP), CT-Flow shifts from closed-box inference to an open, tool-aware paradigm. We curate CT-FlowBench, the first large-scale instruction-tuning benchmark tailored for 3D CT tool-use and multi-step reasoning. Built upon this, CT-Flow functions as a clinical orchestrator capable of decomposing complex natural language queries into automated tool-use sequences. Experimental evaluations on CT-FlowBench and standard 3D VQA datasets demonstrate that CT-Flow achieves state-of-the-art performance, surpassing baseline models by 41% in diagnostic accuracy and achieving a 95% success rate in autonomous tool invocation. This work provides a scalable foundation for integrating autonomous, agentic intelligence into real-world clinical radiology.

研究の動機と目的

現実世界のツール介在型CT解釈ワークフローを反映する放射線科AIを動機づける。
3D CT分析を受動的符号化から積極的・エージェント的プロービングへ MCP を介して転換する。
CTタスクのツール使用軌跡を評価する標準化ベンチマーク（CT-FlowBench）を提供する。
エージェント的でツール介在型の推論が診断精度と透明性を改善することを示す。

提案手法

MCPを用いてデータ取り込み、グローバルナビゲーション、詳細観察、高度分析という4つのツール群から標準化されたツール空間を定義する。
ReActに触発されたループに従い、s_t, a_t, o_tを用いた反復的Reasoning-Acting Trajectoriesとして診断タスクを定式化する。
CT-FlowBenchをCT-RATEデータから構築し、実行可能な推論軌跡と真の診断を捉える。
CT-FlowBenchおよび標準的な3D CT VQAデータセットで監視付きファインチューニング（SFT）を用いてエージェントを訓練・評価する。
ツール使用の利点と一般化を評価するため、先端LLMおよび専門の医療VLMとベンチマークを比較する。

Figure 1: Comparison of 3D CT analysis paradigms. Left: Traditional End-to-End LVLMs rely on passive visual ingestion of 3D data, resulting in static textual outputs. Right: The proposed CT-Flow framework leverages the Model Context Protocol to transform the LLM into an active agent. It dynamically

実験結果

リサーチクエスチョン

RQ1エージェント的CT解釈フレームワークは静的なスライスベースやエンドツーエンドのLVLMアプローチより高い診断精度を達成できるか。
RQ2MCP支援ツール使用は3D CTタスクにおいて信頼できる多段推論と自動ツール起動を可能にするか。
RQ3ドメイン特化のファインチューニング（SFT）がツール介在型CT推論性能に与える影響は。
RQ4CT-Flowのツールオーケストレーションは標準の3D CT VQAベンチマークとCT-FlowBenchの軌跡でベースラインとどう比較されるか。

主な発見

Model	Tool-use	3D-RAD Acc (%): avg	CT-FlowBench ACC (%): avg	QA	AM	DD	Avg.
GPT-5.2	Yes	63.50	37.00	40.00	-	-	37.00
Gemini-3-Pro-Preview	Yes	62.59	44.00	45.00	-	-	44.00
Claude-Sonnet-4.5	Yes	54.83	43.00	44.00	-	-	43.00
Qwen3-VL-235B-A22B-Instruct	Yes	54.21	36.00	30.00	-	-	36.00
GLM4.6-V	Yes	52.04	33.00	25.00	-	-	33.00
M3D-LaMed-Llama-2-7B	No	24.17	17.00	17.00	-	-	17.00
M3D-RAD	No	58.00	34.00	39.00	-	-	34.00
Hulu-Med-7B	No	61.29	46.00	37.00	-	-	46.00
Qwen2.5-VL-7B-Instruct	Yes	26.83	22.00	14.00	-	-	22.00
Qwen3-VL-8B-Instruct	Yes	49.06	26.00	20.00	-	-	26.00
CT-Flow-7B	Yes	61.36	38.00	52.00	-	-	38.00
CT-Flow-8B	Yes	69.46	43.00	40.00	-	-	43.00

CT-Flow-8Bは3D-RAD精度で69.46%の最先端を達成。
CT-Flowは3D-RADでスライスベースのベースラインより診断精度を+22.46%向上。
CT-FlowBenchでは高容量モデルがツール介在推論から利益を得ることを示し、GPT-5.2とGemini-3-Pro-Previewは医療モデルの一部を上回る集計指標を示した。
CT-Flow（SFT）はツール起動の自律性が高く、最大で95%の自律的ツール起動成功を達成。
CT-FlowBenchは多段階の実行可能な推論軌跡を提示し、最終精度は単一スライス判断よりも複雑で多段階の診断を反映。
アブレーションにより4つのツールカテゴリ（Data Ingestion, Global Navigation, Detailed Observation, Advanced Analysis）が性能に不可欠であることが示された。

Figure 2: Overview of the CT-Flow framework. (i) Data Construction: The pipeline for raw data curation, trajectory synthesis, and the establishment of the CT-Flow benchmark. (ii) Architectures: The system decouples the LLM orchestrator from the imaging environment via FASTMCP, bridging high-level se

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。