QUICK REVIEW

[論文レビュー] SwiFT: Swin 4D fMRI Transformer

P. Kim, Junbeom Kwon|arXiv (Cornell University)|Jul 12, 2023

Functional Brain Connectivity Studies被引用数 10

ひとこと要約

SwiFT は 4D Swin Transformer を導入し、raw 4D fMRI データから直接 end-to-end の時空表現を学習し、性別、年齢、認知知能を大規模データセットで効率的に予測可能にします。自己教師あり事前学習の有益性と解釈可能な洞察を提供します。

ABSTRACT

Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Transformer architecture that can learn brain dynamics directly from fMRI volumes in a memory and computation-efficient manner. SwiFT achieves this by implementing a 4D window multi-head self-attention mechanism and absolute positional embeddings. We evaluate SwiFT using multiple large-scale resting-state fMRI datasets, including the Human Connectome Project (HCP), Adolescent Brain Cognitive Development (ABCD), and UK Biobank (UKB) datasets, to predict sex, age, and cognitive intelligence. Our experimental outcomes reveal that SwiFT consistently outperforms recent state-of-the-art models. Furthermore, by leveraging its end-to-end learning capability, we show that contrastive loss-based self-supervised pre-training of SwiFT can enhance performance on downstream tasks. Additionally, we employ an explainable AI method to identify the brain regions associated with sex classification. To our knowledge, SwiFT is the first Swin Transformer architecture to process dimensional spatiotemporal brain functional data in an end-to-end fashion. Our work holds substantial potential in facilitating scalable learning of functional brain imaging in neuroscience research by reducing the hurdles associated with applying Transformer models to high-dimensional fMRI.

研究の動機と目的

高次元の 4D fMRI から直接エンドツーエンドの学習を動機づけ、ROI ベースの前処理なしで脳のダイナミクスをより良く捉える。
fMRI の memory- および computation-efficient な local window attention を備えた 4D Swin Transformer を SwiFT として開発する。
エンドツーエンドの SwiFT が large datasets（HCP, ABCD, UKB）における sex、age、intelligence の予測性能を改善することを示す。
下流の fMRI タスクのための対照的な自己教師あり事前学習の実現可能性と利点を示す。
予測に寄与する脳領域を特定する解釈可能性分析を提供する。

提案手法

Swin Transformer を 4D に拡張し、時間と 3D 空間の次元での fMRI ボリュームを処理できるようにする。
効率的な局所相互作用のために 4D windowed self-attention (4D W-MSA) と 4D shifted window attention (4D SW-MSA) を使用する。
3つの空間次元を跨いだパッチ分割とパッチ結合を実装しつつ、時間次元はそのまま保持する。
各ステージの後に絶対的な 4D 位置埋め込みを採用して空間座標と時間座標をエンコードする。
全トークン相互作用のための最終的なグローバルアテンション段を追加してエンドツーエンド学習を可能にする。
downstream の性能を改善するために、instance contrastive loss と local-local temporal contrastive loss の 2 つの対照的自己教師あり事前学習目的を採用する。
固定された 4D Swin Transformer バックボーンと最終の MLP ヘッドによる重み効率のあるトレーニングを実施する。

実験結果

リサーチクエスチョン

RQ1エンドツーエンドの 4D Swin Transformer は raw fMRI データから直接時空的脳ダイナミクスを効果的に学習できるか。
RQ2SwiFT は sex の分類と年齢/知能の予測において ROI ベースおよび二段階の Transformer/CNN ベースラインを大規模データセットで上回るか。
RQ3対照的自己教師あり事前学習は SwiFT の下流の fMRI 予測タスクを改善できるか。
RQ4解釈可能な寄与度に基づくアトリビューションによって最も寄与する脳領域は何か（性別分類に関して）。
RQ5SwiFT は既存の 4D fMRI モデル（例：TFF）と比較してパラメータ数、FLOPs、スループットの点で効率的か。

主な発見

SwiFT は HCP、ABCD、UKB のデータセットで性別分類と年齢/知能予測のいずれでも最近のベースラインを一貫して上回った。
instance および local-local temporal contrastive losses を用いた自己教師あり事前学習は下流の性能を向上させることがあり、データセットとタスクによって効果が異なる。
統合勾配ベースの解釈は、mPFC、PCC、前部帯状回などの性差に関する既知の文献と一致する脳領域を特定し、年齢グループごとに異なる領域を示した。
SwiFT はグローバルアテンション型の Transformer ベースライン（TFF）よりもパラメータ数・計算効率において優れており、予測性能も向上していることを示した。
モデルは raw 4D fMRI データからのエンドツーエンド学習をサポートし、ROI ベースの特徴抽出や二段階学習パイプラインの必要性を減らす。
長い入力時系列は一部のタスク（例：特定のコホートでの知能）で性能を向上させることがあるが、タスクとデータセットに依存している。

(b) Successive 4D Swin Transformer Blocks

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。