QUICK REVIEW

[논문 리뷰] RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

Xiaoman Zhang, Chaoyi Wu|arXiv (Cornell University)|2024. 04. 25.

Lung Cancer Diagnosis and Treatment인용 수 5

한 줄 요약

RadGenome-Chest CT는 CT-RATE를 기반으로 한 대규모의 영역 기반 흉부 CT 데이터세트로, 197 개 기관 수준 세분화 마스크, 665k 다중-계층 기반 연결된 보고서, 및 1.3M 연결된 VQA 쌍을 제공하여 영역 기반 텍스트 생성 및 다중 모달 의료 기반 모델을 가능하게 한다.

ABSTRACT

Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.

연구 동기 및 목표

흉부 CT 분석을 위한 영역별 감독으로 오픈 데이터세트를 확장하여 일반 의학 AI 개발을 자극합니다.
CT 영상에서 grounded 보고서 생성을 가능하게 하고 grounded VQA를 가능하게 하기 위해 영역 가이드 데이터세트를 만듭니다.
해석이 가능한 다중 모달 모델을 보조하기 위한 자원(세그먼테이션 마스크, grounded 보고서, VQA 쌍)을 제공합니다.

제안 방법

SAT를 사용한 기관 분할(197 영역)으로 3D 흉부 CT 영역 기반 확장을 통해 CT-RATE를 확장합니다.
GPT-4와 내부 NER/QA 파이프라인을 통해 방사선학 보고서를 해부학적으로 일치하는 문장으로 파싱하고, 문장을 세그먼테이션 마스크에 연결합니다.
Findings와 Impressions를 세그먼테이션 영역에 묶인 QA 템플릿으로 변환하여 영역-기반 VQA 데이터를 생성합니다.
Grounded 보고서 및 VQA 구성요소의 수작업 검증을 통해 근거화 품질을 검증합니다.

실험 결과

연구 질문

RQ1대규모 흉부 CT 데이터세트에 지역 수준 grounding을 추가하여 grounded 다중 모달 작업을 어떻게 지원할 수 있을까?
RQ2CT-RATE에서 달성 가능한 세그먼테이션, 영역-기반 보고서 및 VQA 쌍의 규모와 질은 얼마나 될까?
RQ3영역-텍스트 연결이 방사선학용 해석 가능한 다중 모달 의료 기반 모델을 개선할 수 있을까?

주요 결과

데이터세트는 25,692개의 비대조 3D 흉부 CT 볼륨과 20,000명의 환자에서 나온 보고서를 포함합니다.
흉부 CT 영역에 대해 197 개 기관 수준 세그먼트 마스크가 생성되었습니다.
각 문장이 세그먼테이션 영역에 연결된 665K 다중-계층 grounding 보고서가 생성되었습니다.
1.3M개의 grounding VQA 쌍(영역- 및 사례 수준)이 생성되었으며, 검증 세트에서 수작업으로 확인되었습니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.