QUICK REVIEW

[论文解读] LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching

Duy M. H. Nguyen, Hoang Nguyen|arXiv (Cornell University)|Jun 20, 2023

Radiomics and Machine Learning in Medical Imaging被引用 18

一句话总结

LVM-Med 引入一个大规模自监督医学影像模型，在 ~1.3 million images from 55 public datasets 的训练上，使用一种新颖的二阶图匹配目标，以学习鲁棒表征，在 15 下游任务上优于若干 SSL 与基础模型。

ABSTRACT

Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.

研究动机与目标

由于自然图像的领域偏移，动机是需要在医学影像领域进行大规模、特定领域的自监督学习。
提出一个新的 SSL 框架（LVM-Med），它利用二阶图匹配来学习鲁棒的表征。
创建一个大型、多样化的医学影像数据集（~1.3M 张图像，来自 55 个公开数据集）以基准化医学中的 SSL 方法。
在包括分割、分类和检测在内的 15 个下游任务中，在含分布内与分布外设置下，展示了最先进的性能。

提出的方法

为每张图像构建两个扭曲视图，并通过共享骨干网络对它们进行编码以获得嵌入。
在一个批次内构建两个图，其中节点表示扭曲视图，边编码局部/全局亲和性。
使用全局余弦相似性和局部、区域感知代价来定义顶点亲和性，合并为统一的亲和性 c^v。
引入二阶图匹配，使用边缘亲和性 c^e 以捕捉匹配对之间的关系结构。
用组合目标求解图匹配问题，并通过基于 IMLE 的梯度估计进行端到端训练的梯度学习。
通过对代价引入 Gumbel 噪声并通过有限差分的 IMLE 方案估计梯度，在离散求解器上进行反向传播以实现训练。

实验结果

研究问题

RQ1二阶图匹配 SSL 目标是否能改进医学影像的表征学习，相比传统的逐对对比损失？
RQ2将全局和局部亲和信息整合到基于图的 SSL 框架是否能在多样化的医学模态和任务中带来鲁棒、可迁移的特征？
RQ3与有监督、SSL 和基础模型相比，LVM-Med 在 15 个下游任务中的在分布内和分布外条件下的表现如何？
RQ4在多模态、公开数据集上，使用对黑箱求解器的梯度估计，训练大规模医学 SSL 模型是否可行？

主要发现

LVM-Med 在 15 个医学任务上显著优于多种最先进的有监督、自监督和基础模型。
在 Brain Tumor Classification 和 Diabetic Retinopathy Grading 上，LVM-Med 通过仅使用 ResNet-50 骨干，将此前在 1B 掩码上训练的视觉-语言模型提高了 6–7 个百分点。
二阶图匹配的形式化，结合顶点和边缘亲和性，相较于纯线性（成对）匹配方法带来更强的鲁棒性提升。
LVM-Med 在 ResNet-50 与 SAM 的 ViT 骨干上在 2D 与 3D 分割任务上都取得了出色的结果，且常常超越基于 SAM 的提示设置。
该方法可扩展到大规模数据集，并且可以使用基于 IMLE 的梯度估计端到端训练，尽管图匹配具有组合性质。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。