QUICK REVIEW

[论文解读] Efficient Complex-Valued Vision Transformers for MRI Classification Directly from k-Space

Moritz Rempe, Lukas T. Rotkopf|arXiv (Cornell University)|Jan 26, 2026

Advanced MRI Techniques and Applications被引用 0

一句话总结

介绍 kViT，一种全复数值的 Vision Transformer，直接在欠采样的 k-Space MRI 数据上运行，使用径向补丁和复数位置嵌入，在维持竞争性准确性的同时大幅降低 VRAM 使用。

ABSTRACT

Deep learning applications in Magnetic Resonance Imaging (MRI) predominantly operate on reconstructed magnitude images, a process that discards phase information and requires computationally expensive transforms. Standard neural network architectures rely on local operations (convolutions or grid-patches) that are ill-suited for the global, non-local nature of raw frequency-domain (k-Space) data. In this work, we propose a novel complex-valued Vision Transformer (kViT) designed to perform classification directly on k-Space data. To bridge the geometric disconnect between current architectures and MRI physics, we introduce a radial k-Space patching strategy that respects the spectral energy distribution of the frequency-domain. Extensive experiments on the fastMRI and in-house datasets demonstrate that our approach achieves classification performance competitive with state-of-the-art image-domain baselines (ResNet, EfficientNet, ViT). Crucially, kViT exhibits superior robustness to high acceleration factors and offers a paradigm shift in computational efficiency, reducing VRAM consumption during training by up to 68$ imes$ compared to standard methods. This establishes a pathway for resource-efficient, direct-from-scanner AI analysis.

研究动机与目标

推动直接在 MRI k-Space 数据上进行 AI 分析，以保留相位信息并避免有损重建。
开发适用于非局部 k-Space 数据的复数值 ViT 架构。
提出物理信息驱动的径向 k-Space 补丁策略以及复数位置嵌入。
在多个 MRI 数据集上评估对欠采样的鲁棒性。
展示相对于标准图像域基线的 VRAM 效率优势。

提出的方法

实现具备复数值多头自注意力和前馈网络的全复数值 Transformer。
引入按从 k-Space 中心的径向距离对像素进行分段的径向 k-Space 补丁。
对补丁应用复数值线性投影，并使用复数值位置嵌入（可学习或基于 RoPE 的）。
在 fastMRI Prostate 与 Knee 数据集，以及自有的 Glioma 数据集上进行训练，欠采样因子高达 24x。
与实值 ViT、EfficientNet 和 ResNet 基线进行比较。
在 5-fold 交叉验证及不同加速因子下，使用 AUROC 和 AUPRC 进行评估。

实验结果

研究问题

RQ1直接在 k-Space 数据上运行的复数值 Vision Transformer 能否达到甚至超越图像域基线在 MRI 分类上的表现？
RQ2径向补丁是否比笛卡尔坐标补丁更能捕捉 k-Space 结构在 MRI 任务中的作用？
RQ3与实值模型相比，k-SViT 的性能与资源占用随欠采样程度的变化如何？
RQ4复数值位置嵌入对 k-Space Transformer 性能有何影响？
RQ5该方法在不同 MRI 模态和病理任务（前列腺、膝盖、大脑胶质瘤）中是否具有鲁棒性？

主要发现

kViT 在与图像域基线相比具有竞争性的 AUROC 和 AUPRC，同时在训练过程中显著降低 VRAM 使用（MIL 设置下最高达 68×）。
在 fastMRI Prostate 上，在高欠采样（16×）下，kViT 仍保持稳健性能，VRAM 远低于 ResNet50。
在 fastMRI Knee，kViT 达到与 SOTA 相当的性能并降低 VRAM，尽管在更高欠采样下性能下降，类似于基线。
MIL 实验显示 kViT 具有显著较低的 VRAM（0.52–0.96 GB），但仍具备强的患者级 AUPRC 和具有竞争力的 AUROC。
注意力图显示对 k-Space 中心的关注，同时关注外部高频区域，与对频谱信息的使用保持一致。
消融研究显示 16-ring 作为补丁的最优选择，强调相位信息的重要性，并展示 Cutout 增强的好处。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。