QUICK REVIEW

[논문 리뷰] EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

Danli Shi, Weiyi Zhang|arXiv (Cornell University)|2024. 05. 18.

Image Retrieval and Classification Techniques인용 수 8

한 줄 요약

EyeFound는 11개 모달리티에 걸친 2.78M개의 라벨링되지 않은 망막 영상으로 학습된 다중모달 안과 기초 모델로, 일반화 가능한 표현을 학습하고 다양한 다운스트림 작업을 지원하며 진단, 전신 위험 예측, 제로샷 다중모달 VQA에서 RETFound를 능가합니다.

ABSTRACT

Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separate weights for each imaging modality, preventing a comprehensive representation of multi-modal features. This highlights the need for versatile foundation models capable of handling various tasks and modalities in ophthalmology. To address this gap, we present EyeFound, a multimodal foundation model for ophthalmic images. Unlike existing models, EyeFound learns generalizable representations from unlabeled multimodal retinal images, enabling efficient model adaptation across multiple applications. Trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities, EyeFound facilitates generalist representations and diverse multimodal downstream tasks, even for detecting challenging rare diseases. It outperforms previous work RETFound in diagnosing eye diseases, predicting systemic disease incidents, and zero-shot multimodal VQA. EyeFound provides a generalizable solution to improve model performance and lessen the annotation burden on experts, facilitating widespread clinical AI applications for retinal imaging.

연구 동기 및 목표

다수의 모달리티와 태스크를 광범위한 작업별 주석 없이 처리할 수 있는 다목적 안과 영상 기초 모델의 필요성을 제시합니다.
다중 모달 데이터 비라벨 기반 표현 학습 접근법을 개발하여 다양한 안과 응용에 효율적으로 적응할 수 있게 합니다.
안과 및 전신 질병 관련 태스크에서 강건한 성능을 가능하게 하면서 주석 부담을 줄입니다.

제안 방법

227개 병원에서 수집된 11개 안과 모달리티의 망막 영상 2.78백만 장으로 다중모달 기초 모델을 학습합니다.
비라벨 다중모달 데이터에서 일반화 가능한 표현을 학습하여 태스크 간 적응을 가능하게 합니다.
질병 진단, 전신 질병 사건 예측, 제로샷 다중모달 VQA 등 다양한 다운스트림 태스크에서 모델을 평가합니다.
다중모달 안과 이해의 이점을 평가하기 위해 이전 연구 RETFound와의 성능을 비교합니다.

실험 결과

연구 질문

RQ1단일 다중모달 기초 모델이 각 모달리티별 광범위한 감독 없이도 안과 모달리티 및 태스크 간 전달 가능한 표현을 학습할 수 있는가?
RQ2EyeFound가 안과 질병 진단, 전신 질환 예측, 제로샷 다중모달 VQA에서 RETFound과 비교하여 어떤 성능을 보이는가?
RQ3비라벨 다중모달 학습이 안과의 희귀 질환에 대한 일반화를 개선하는가?
RQ4EyeFound가 주석 부담을 어느 정도까지 줄이면서 태스크 성능을 유지하거나 향상시킬 수 있는가?

주요 결과

EyeFound는 모달리티 전반에 걸친 안구 질환 진단에서 RETFound보다 성능이 향상됨.
EyeFound는 안과 데이터로 전신 질환 사건 예측에서 더 나은 결과를 달성함.
EyeFound는 제로샷 다중모달 VQA 태스크에서 우위를 차지함.
이 모델은 11개 모달리티에 걸친 대규모 비라벨 다중모달 망막 데이터셋으로 학습되어 여러 다운스트림 태스크에 효율적으로 적응 가능함.
EyeFound는 망막 영상 AI에서 주석 요구를 완화할 수 있는 일반화 가능한 솔루션을 제공합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.