QUICK REVIEW

[論文レビュー] ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration

Junyu Chen, Yufan He|arXiv (Cornell University)|Apr 13, 2021

Advanced Neural Network Applications参考文献 20被引用数 139

ひとこと要約

ViT-V-Netは、無監督の体積医用画像登録のためのハイブリッドConvNet-Transformerアーキテクチャを導入し、脳MRIデータセットでトップ手法を上回るDice性能を達成します。

ABSTRACT

In the last decade, convolutional neural networks (ConvNets) have dominated and achieved state-of-the-art performances in a variety of medical imaging applications. However, the performances of ConvNets are still limited by lacking the understanding of long-range spatial relations in an image. The recently proposed Vision Transformer (ViT) for image classification uses a purely self-attention-based model that learns long-range spatial relations to focus on the relevant parts of an image. Nevertheless, ViT emphasizes the low-resolution features because of the consecutive downsamplings, result in a lack of detailed localization information, making it unsuitable for image registration. Recently, several ViT-based image segmentation methods have been combined with ConvNets to improve the recovery of detailed localization information. Inspired by them, we present ViT-V-Net, which bridges ViT and ConvNet to provide volumetric medical image registration. The experimental results presented here demonstrate that the proposed architecture achieves superior performance to several top-performing registration methods.

研究の動機と目的

変形的画像登録（DIR）を動機づけ、長距離空間的関係のモデリングにおける畳み込みネットワークの制限に対処する。
3D画像登録のための長距離特徴学習を可能にするハイブリッドViT-ConvNetアーキテクチャを提案する。
ViT-V-Netが登録精度（Dice）を向上させ、長いスキップ接続を介して局在化を維持することを示す。
脳MRIデータセット上で最先端の登録法と比較評価し、実装の詳細を提供する。

提案手法

固定画像と動画像の3D高次特徴をConvNetブロックとプーリングを用いてエンコードし、解像度を低下させる。
高次特徴をN個のパッチに分割し、Vision Transformerを適用して長距離関係を学習する。
パッチを線形射影で埋め込み、位置情報のための学習可能な位置埋め込みを加える。
Transformerの出力をV-Netスタイルのデコーダーに通し、長いスキップ接続で局在情報を保持する。
密な変位場uを予測し、空間変換器で動画像をワープし、MSE類似度と拡散正則化を組み合わせた損失を最適化する。

実験結果

リサーチクエスチョン

RQ1完全なConvNetベースのレジストリと比較して、ハイブリッドConvNet-Transformerアーキテクチャは無監督の3D画像登録を改善できるか？
RQ2Vision Transformerベースのエンコードは、体積的整列の正確性に重要な長距離空間関係を強化するか？
RQ3ViT-V-Netアーキテクチャは脳MRIデータで主要なDIR法より高いDiceスコアを達成できるか？

主な発見

Method	Affine	NiftyReg	SyN	VoxelMorph-1	VoxelMorph-2	ViT-V-Net	Dice
Dice	0.569 ± 0.171	0.713 ± 0.134	0.688 ± 0.140	0.707 ± 0.137	0.711 ± 0.135	0.726 ± 0.130

ViT-V-Netは、検証設定でいくつかのトップ登録法より高いDiceスコアを達成する。
主要比較表に報告されたDice: ViT-V-Net 0.726 ± 0.130 vs. others (Affine 0.569 ± 0.171, NiftyReg 0.713 ± 0.134, SyN 0.688 ± 0.140, VoxelMorph-1 0.711 ± 0.135, VoxelMorph-2 0.707 ± 0.137)。
長いスキップ接続で訓練したViT-V-Netは局在化情報を保持し、訓練損失が低く検証Diceが高いことを示す。
統計検定（対応のあるt検定）により、ViT-V-Netがいくつかの競合より有意に優れていることが示される（p値は論文に示されている）。
本手法はGPU上で実行時間が報告されており、実用利用の実現性を際立たせている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。