QUICK REVIEW

[논문 리뷰] VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

Jianing Qiu, Jian Wu|arXiv (Cornell University)|2023. 10. 08.

Retinal Imaging and Analysis인용 수 14

한 줄 요약

요약(TL;DR): VisionFM은 다양한 모달리티와 디바이스에 걸쳐 일반적 진단, 분절, 예후 및 전신 바이오마커 예측을 가능하게 하기 위해 3.4M 이미지로 학습된 다중 모달, 다중 태스크 안과 기초 모델로, 기초선 Baselines를 능가하고 여러 작업에서 주니어에서 중급 임상의에 필적하거나 이를 넘어섭니다.

ABSTRACT

We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.

연구 동기 및 목표

다양한 질환, 모달리티 및 다중 작업을 다룰 수 있는 일반적 안과 AI 모델의 필요성 제시.
다양한 안과 데이터를 바탕으로 진단, 예후, 분절, 및 전신 바이오마커 예측을 가능하게 하는 VisionFM을 기초 모델로 개발.
보이지 않는 모달리티, 기기 및 대표성이 부족한 질환에 대한 일반화를 보여주고 학습에서 합성 데이터의 역할을 탐구.
모달리티에 구애받지 않는 디코더가 안과 전반의 다운스트림 작업 해결을 어떻게 효율적이고 확장 가능하게 하는지 시연.

제안 방법

8개 영상 모달리티와 다양한 기기를 통해 560,457명의 3.4백만 개 안과 이미지에서 VisionFM을 사전 학습합니다.
모달리티에 상관없는 디코더를 사용하여 다중 모달 입력으로 진단, 예후, 분절, 랜드마크 탐지, 전신 바이오마커 예측 등 여러 작업을 수행합니다.
표현 학습 및 다운스트림 성능을 향상시키기 위해 자기지도 학습과 합성 안과 데이터를 도입합니다.
5개 모달리티와 8개 질환에 걸친 23개의 공개 데이터셋과 5개의 비공개 데이터셋의 대규모 결합 벤치마크로 평가합니다.
VisionFM 위에 선형 프로빙을 통해 새로운 질환과 모달리티에 대한 원샷/다샷(1샷, 5샷, 10샷) 적응을 examines 합니다.
모델의 설명 가능성과 해석 가능성을 제공하기 위해 주의 집중 맵과 사전 학습의 진화를 시각화합니다.

실험 결과

연구 질문

RQ1 VisionFM이 여러 안과 질환과 영상 모달리티에 걸쳐 높은 정확도의 모달리티 독립적 질환 진단을 달성할 수 있는가?
RQ2사전 학습 중에 보지 못한 새로운 모달리티와 영상 기기에 VisionFM이 얼마나 잘 일반화하는가?
RQ3합성 데이터가 VisionFM의 표현 학습 및 다운스트림 성능에 미치는 영향은 무엇인가?
RQ4VisionFM이 안과 영상에서 분절, 랜드마크 탐지, 예후 및 전신 바이오마커 예측을 공동으로 지원할 수 있는가?
RQ5대표성이 부족한 질환 및 새로운 작업에 대해 VisionFM이 몇 샷 적응에서 어떤 성능을 보이는가?

주요 결과

VisionFM은 대규모 벤치마크에서 8개 질환에 대해 5개 모달리티에서 평균 AUC 0.993를 달성했습니다.
VisionFM의 모달리티 독립 디코더는 ResNet 기준선을 상회했고 12질환 진단에서 1–3년 및 4–8년 경력의 안과의사와 대등하거나 능가했습니다.
OCTA(새 모달리티)에서 DR 등급은 사전 학습 중 OCTA 노출이 없었음에도 AUC 0.935를 달성했습니다.
VisionFM은 초광시야 펀드스 사진 기기의 일반화가 강하게 나타나 DR 등급에서 AUC 0.779를 기록했고 소샷 설정(1–10샷)에서 안구 색소형성인(ocular albinism) 인식에도 강건했습니다.
Segmentation 성능: 혈관 Dice 81.75%, OCT 층 Dice 96.18%; 안와 MRI 종양 Segmentation Dice 79.49%(대비 U-Net 41.69%); UBM 랜드마크 탐지 Euclidean 오차 4.90픽셀(대비 U-Net 12.86).
합성 데이터(적절한 실제:합성 비율)가 표현 학습을 향상시켰고, slit-lamp MRI 합성 데이터는 실제:합성 1:5에서 최적 이득을 달성했습니다.
VisionFM이 망막 사진에서 두개내 종양 여부를 AUC 0.982 및 AP 0.990으로 예측할 수 있었고, 이는 임상의보다 우수했습니다.]
table_headers: []
table_rows: []

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.