QUICK REVIEW

[论文解读] SwiFT: Swin 4D fMRI Transformer

P. Kim, Junbeom Kwon|arXiv (Cornell University)|Jul 12, 2023

Functional Brain Connectivity Studies被引用 10

一句话总结

SwiFT 引入了一个 4D Swin Transformer，用于直接从原始的 4D fMRI 数据学习端到端的时空表征，从而在大规模数据集上高效预测性别、年龄和认知智能，并具有有益的自监督预训练和可解释的洞察。

ABSTRACT

Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Transformer architecture that can learn brain dynamics directly from fMRI volumes in a memory and computation-efficient manner. SwiFT achieves this by implementing a 4D window multi-head self-attention mechanism and absolute positional embeddings. We evaluate SwiFT using multiple large-scale resting-state fMRI datasets, including the Human Connectome Project (HCP), Adolescent Brain Cognitive Development (ABCD), and UK Biobank (UKB) datasets, to predict sex, age, and cognitive intelligence. Our experimental outcomes reveal that SwiFT consistently outperforms recent state-of-the-art models. Furthermore, by leveraging its end-to-end learning capability, we show that contrastive loss-based self-supervised pre-training of SwiFT can enhance performance on downstream tasks. Additionally, we employ an explainable AI method to identify the brain regions associated with sex classification. To our knowledge, SwiFT is the first Swin Transformer architecture to process dimensional spatiotemporal brain functional data in an end-to-end fashion. Our work holds substantial potential in facilitating scalable learning of functional brain imaging in neuroscience research by reducing the hurdles associated with applying Transformer models to high-dimensional fMRI.

研究动机与目标

激励直接从高维 4D fMRI 进行端到端学习，以在不进行 ROI 基于预处理的情况下更好地捕捉脑部动力学。
开发 SwiFT，一种具备高效内存与计算的本地窗口注意力的 4D Swin Transformer，用于 fMRI。
展示端到端的 SwiFT 在性别、年龄和智能力预测方面在大规模数据集（HCP、ABCD、UKB）上提升预测性能。
证明对比自监督预训练在下游 fMRI 任务中的可行性和益处。
提供可解释性分析，以识别对预测有贡献的脑区。

提出的方法

将 Swin Transformer 扩展到 4D，以在 fMRI 体积的时间和 3D 空间维度上工作。
使用 4D 窗口自注意力（4D W-MSA）和 4D 位移窗口注意力（4D SW-MSA）实现高效的局部交互。
在保持时间维不变的同时，实现跨三个空间维的 patch 分区和 patch 合并。
采用在每个阶段之后添加的绝对 4D 位置嵌入，以编码空间和时间坐标。
通过最终的全局注意力阶段实现端到端学习，以实现全令牌交互。
采用两种对比自监督预训练目标（实例对比损失和局部-局部时间对比损失）以提高下游性能。
使用固定的 4D Swin Transformer 主干和最终的 MLP 头进行权重高效训练，以完成下游任务。

实验结果

研究问题

RQ1端到端的 4D Swin Transformer 能否直接从原始 fMRI 数据中有效学习时空脑动力学？
RQ2SwiFT 是否在性别分类以及年龄/智力预测上优于 ROI 基于和两步 Transformer/CNN 基线，适用于大规模数据集？
RQ3对比自监督预训练是否能提升 SwiFT 的下游 fMRI 预测任务？
RQ4根据可解释性归因，哪些脑区对性别分类贡献最大？
RQ5与现有的 4D fMRI 模型如 TFF 相比，SwiFT 在效率（参数量、FLOPs、吞吐量）方面如何？

主要发现

SwiFT 在 HCP、ABCD 和 UKB 数据集上的性别分类以及年龄/智力预测方面，始终优于近期基线。
采用实例对比损失和局部-局部时间对比损失的自监督预训练可以提高下游性能，效果因数据集和任务而异。
基于集成梯度的解释识别出与已知性别差异文献一致的脑区（例如 mPFC、PCC、前扣带皮层）以及随年龄段变化的脑区。
SwiFT 在参数和计算效率上优于全局注意力 Transformer 基线（TFF），同时实现更好的预测性能。
该模型支持从原始 4D fMRI 数据进行端到端学习，减少对 ROI 基于特征提取和两步学习流程的需求。
更长的输入时间序列在某些任务（如某些队列中的智力）上可以提高性能，但效果取决于任务和数据集。

(b) Successive 4D Swin Transformer Blocks

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。