QUICK REVIEW

[論文レビュー] LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Duy M. H. Nguyen, Hoang Nguyen|arXiv (Cornell University)|Jun 20, 2023

Radiomics and Machine Learning in Medical Imaging被引用数 18

ひとこと要約

LVM-Med は ~1.3 million images from 55 公開データセットで訓練された大規模な自己教師付き医用画像モデルを導入し、新しい二次グラフマッチング目的を用いて頑健な表現を学習し、15 の下流タスクで複数の SSL およびファウンデーションモデルを上回る。

ABSTRACT

Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.

研究の動機と目的

医療画像における自然画像からのドメインシフトのため、広範なスケールのドメイン特化型自己教師付き学習の必要性を動機付ける。
二次グラフマッチングを活用して頑健な表現を学習する新しい SSL フレームワーク（LVM-Med）を提案する。
医療分野の SSL 手法をベンチマークするために、約1.3M の画像を含む大規模で多様な医用画像データセットを作成する（55 の公開データセットから）。
セグメンテーション、分類、検出を含む15の下流タスクにおいて、監視付き、SSL、ファウンデーションモデルを含むいずれの設定でも最先端の性能を示す。

提案手法

各画像について2つの歪んだビューを構築し、共有バックボーンでエンコードして埋め込みを得る。
バッチごとにノードが歪んだビューを、エッジが局所/全体の類似性を符号化する2つのグラフを構築する。
頂点の親和性をグローバルなコサイン類似度と局所・領域認識コストを組み合わせて統一的な親和性c^vとして定義する。
一致したペア間の関係構造を捉えるために、辺の親和性c^eを用いた二次グラフマッチングを導入する。
組合せ目的を用いてグラフマッチング問題を解き、エンドツーエンド訓練のために IMLE に基づく勾配推定で勾配を学習する。
離散ソルバーを通じて勾配を伝搬させるために、コストをガンベルノイズで摂動し、有限差分の IMLE スキームで勾配を推定する。

実験結果

リサーチクエスチョン

RQ1第二次グラフマッチング SSL 目的は、従来の対比学習損失と比較して医用画像の表現学習を改善できるか。
RQ2グローバルおよび局所の親和性情報をグラフベースの SSL フレームワークに統合することで、多様な医療モーダリティとタスクに対して頑健で転用可能な特徴を生み出せるか。
RQ3LVM-Med は、監視付き、SSL、ファウンデーションモデルと比較して、内外分布設定の15の下流タスクでどのように性能を示すか。
RQ4勾配推定を用いたブラックボックス解 solver で、大規模な医療 SSL モデルをマルチモーダルの公開データセット上で効率的に訓練することは可能か。

主な発見

LVM-Med は 15 の医療タスクにおいて、いくつかの最先端の監視付き・自己教師付き・ファウンデーションモデルを一貫して上回る。
Brain Tumor Classification および Diabetic Retinopathy Grading では、ResNet-50 バックボーンのみを用いて、1B マスクで訓練された以前のビジョン言語モデルを6–7ポイント上回る。
頂点および辺の親和性を含む二次グラフマッチングの定式化は、純粋な線形（対になる）マッチング手法よりも頑健な改善を生み出す。
ResNet-50 および SAM の ViT バックボーンを用いた LVM-Med は、2D および 3D のセグメンテーションタスクの両方で強力な結果を示し、SAM ベースのプロンプト設定をしばしば上回る。
このアプローチは大規模データセットへスケールし、グラフマッチングの組合せ性にも関わらず、IMLE ベースの勾配推定を用いてエンドツーエンド訓練が可能である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。