QUICK REVIEW

[論文レビュー] EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

Danli Shi, Weiyi Zhang|arXiv (Cornell University)|May 18, 2024

Image Retrieval and Classification Techniques被引用数 8

ひとこと要約

EyeFound は、11 のモダリティにわたる 2.78 million のラベルなし網膜画像で学習した、汎用表現を学習し多様な下流タスクをサポートする多模态眼科ファウンデーションモデルで、診断、全身リスク予測、ゼロショット多模态 VQA で RETFound を上回っています。

ABSTRACT

Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separate weights for each imaging modality, preventing a comprehensive representation of multi-modal features. This highlights the need for versatile foundation models capable of handling various tasks and modalities in ophthalmology. To address this gap, we present EyeFound, a multimodal foundation model for ophthalmic images. Unlike existing models, EyeFound learns generalizable representations from unlabeled multimodal retinal images, enabling efficient model adaptation across multiple applications. Trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities, EyeFound facilitates generalist representations and diverse multimodal downstream tasks, even for detecting challenging rare diseases. It outperforms previous work RETFound in diagnosing eye diseases, predicting systemic disease incidents, and zero-shot multimodal VQA. EyeFound provides a generalizable solution to improve model performance and lessen the annotation burden on experts, facilitating widespread clinical AI applications for retinal imaging.

研究の動機と目的

多様なモダリティとタスクを扱える、広範な per-task アノテーションを必要としない、汎用的な眼科画像ファウンデーションモデルの推進。
多模態の、ラベルなしデータ主導の表現学習アプローチを開発し、さまざまな眼科応用への効率的な適応を可能にする。
眼科および全身疾患関連タスクの堅牢な性能を実現しつつ、アノテーションの負担を軽減する。

提案手法

11 の眼科モダリティにわたる 227 病院からの 2.78 million retinal images を用いて、多模態ファウンデーションモデルを訓練する。
ラベルなしの多模態データから一般化可能な表現を学習し、タスク間の適応を可能にする。
病気の診断、全身疾患発生予測、ゼロショット多模态 VQA を含む多様な下流タスクでモデルを評価する。
多模態眼科理解の向上を評価するため、先行研究 RETFound と性能を比較する。

実験結果

リサーチクエスチョン

RQ1単一の多模態ファウンデーションモデルは、個別モダリティ監督を大幅に要せずに、眼科モダリティ間およびタスク間で転移可能な表現を学習できるか？
RQ2EyeFound は、眼科疾患診断、全身疾患予測、および RETFound と比較してゼロショット多模态 VQA においてどの程度の性能を示すか？
RQ3ラベルなしの多模態訓練は眼科の希少疾患への一般化を改善するか？
RQ4EyeFound はタスク性能を維持または向上させつつ、どの程度アノテーション負担を軽減できるのか？

主な発見

EyeFound は、モダリティを跨いだ眼疾患の診断において RETFound より性能が向上することを示している。
EyeFound は、眼科データからの全身疾患発生予測でより良い結果を達成する。
EyeFound はゼロショット多模态 VQA タスクで優位性を獲得する。
本モデルは 11 modalities にまたがる大規模なラベルなしの多模态網膜データセットで訓練され、複数の下流タスクへの効率的な適応を可能にする。
EyeFound は網膜画像AIにおけるアノテーション要件を軽減できる、一般化可能なソリューションを提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。