QUICK REVIEW

[Paper Review] RadGPT: Constructing 3D Image-Text Tumor Datasets

Pedro R. A. S. Bassi, Mehmet Can Yavuz|arXiv (Cornell University)|Jan 8, 2025

Radiomics and Machine Learning in Medical Imaging3 citations

TL;DR

RadGPT creates AbdomenAtlas 3.0, a large 3D abdominal CT image-text dataset with per-voxel tumor annotations and reports, and presents an anatomy-aware vision-language agent that generates structured, narrative, and fusion reports from CT scans.

ABSTRACT

With over 85 million CT scans performed annually in the United States, creating tumor-related reports is a challenging and time-consuming task for radiologists. To address this need, we present RadGPT, an Anatomy-Aware Vision-Language AI Agent for generating detailed reports from CT scans. RadGPT first segments tumors, including benign cysts and malignant tumors, and their surrounding anatomical structures, then transforms this information into both structured reports and narrative reports. These reports provide tumor size, shape, location, attenuation, volume, and interactions with surrounding blood vessels and organs. Extensive evaluation on unseen hospitals shows that RadGPT can produce accurate reports, with high sensitivity/specificity for small tumor (<2 cm) detection: 80/73% for liver tumors, 92/78% for kidney tumors, and 77/77% for pancreatic tumors. For large tumors, sensitivity ranges from 89% to 97%. The results significantly surpass the state-of-the-art in abdominal CT report generation. RadGPT generated reports for 17 public datasets. Through radiologist review and refinement, we have ensured the reports' accuracy, and created the first publicly available image-text 3D medical dataset, comprising over 1.8 million text tokens and 2.7 million images from 9,262 CT scans, including 2,947 tumor scans/reports of 8,562 tumor instances. Our reports can: (1) localize tumors in eight liver sub-segments and three pancreatic sub-segments annotated per-voxel; (2) determine pancreatic tumor stage (T1-T4) in 260 reports; and (3) present individual analyses of multiple tumors--rare in human-made reports. Importantly, 948 of the reports are for early-stage tumors.

Motivation & Objective

Address the lack of publicly available abdominal CT datasets with per-voxel tumor annotations and real radiology reports.
Develop RadGPT, an anatomy-aware vision-language agent, to generate detailed structured and narrative reports from CT scans.
Create AbdomenAtlas 3.0, the first public dataset with per-voxel tumor annotations, organ sub-segmentation, and pancreatic cancer staging in 3D CTs.
Enable automated report generation that aligns with radiologist templates and institutional styles, with diagnostic evaluation metrics.
Provide benchmarks and a framework for tumor localization, measurement, staging, and fusion of structured and human-made reports.

Proposed method

Stage I: Segmentation of tumors and 26 anatomical structures using DiffTumor and nnU-Net with radiologist-driven refinement.
Stage II: Structured report generation via deterministic rule-based algorithms that fill radiologist templates using segmentations and derived measurements (size, volume, attenuation).
Stage III: Style adaptation of structured reports to target institutions' narrative styles through in-context learning with a target-hospital prompt set.
Fusion reporting by prompting a zero-shot LLM to combine structured reports with clinical notes into comprehensive fusion reports.
Diagnostic evaluation of AI-made reports using an LLM to extract presence/absence of tumors and compute sensitivity/specificity, enabling clinically meaningful assessment.
Pancreatic cancer staging enabled by measuring tumor interactions with vessels (SMA, CHA, CA, portal vein) and deriving T-stages via deterministic vessel–tumor analyses.

Experimental results

Research questions

RQ1Can RadGPT produce accurate, institution-adaptable structured and narrative reports from per-voxel abdominal CT tumor annotations?
RQ2Does a segmentation-driven report generation approach outperform end-to-end abdominal CT report models on tumor detection and staging?
RQ3How can we evaluate AI-made radiology reports using clinically meaningful metrics beyond text similarity?
RQ4What value does AbdomenAtlas 3.0 add by providing per-voxel pancreas sub-segments, peripancreatic vessels, and PDAC staging in a public dataset?

Key findings

RadGPT outperforms end-to-end abdominal CT report models on tumor detection across large and small tumors in liver, pancreas, kidney, and liver metastases.
Fully automated RadGPT reports achieve higher sensitivity for tumor detection and comparable or better specificity than M3D and CT2Rep baselines.
AbdomenAtlas 3.0 provides 9,262 CT scans with per-voxel tumor annotations across three organs and includes pancreas sub-segments and PDAC staging.
RadGPT achieves automatic PDAC T-stage determination and provides per-voxel organ and vessel annotations to support staging.
Radiologist evaluation shows 75.6% tumor-detection precision and 93.8% tumor-size measurement accuracy for RadGPT across evaluated cases.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.