QUICK REVIEW

[論文レビュー] Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

Hu Cao, Yueyue Wang|arXiv (Cornell University)|May 12, 2021

Advanced Neural Network Applications参考文献 31被引用数 899

ひとこと要約

Swin-Unetは、2D医用画像分割のためのスキップ接続を備えた純粋なTransformerベースのU字型エンコーダ-デコーダを提案し、Synapseで最先端の結果を達成し、ACDCでは畳み込みを使わずに高い性能を示す。

ABSTRACT

In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. Especially, the deep neural networks based on U-shaped architecture and skip-connections have been widely applied in a variety of medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global and long-range semantic information interaction well due to the locality of the convolution operation. In this paper, we propose Swin-Unet, which is an Unet-like pure Transformer for medical image segmentation. The tokenized image patches are fed into the Transformer-based U-shaped Encoder-Decoder architecture with skip-connections for local-global semantic feature learning. Specifically, we use hierarchical Swin Transformer with shifted windows as the encoder to extract context features. And a symmetric Swin Transformer-based decoder with patch expanding layer is designed to perform the up-sampling operation to restore the spatial resolution of the feature maps. Under the direct down-sampling and up-sampling of the inputs and outputs by 4x, experiments on multi-organ and cardiac segmentation tasks demonstrate that the pure Transformer-based U-shaped Encoder-Decoder network outperforms those methods with full-convolution or the combination of transformer and convolution. The codes and trained models will be publicly available at https://github.com/HuCaoFighting/Swin-Unet.

研究の動機と目的

CNNが医用画像分割においてグローバルな長距離相互作用を捉えるのに苦労することを動機づける。
局所からグローバルな文脈をモデル化する純粋なTransformerベースのUnet風アーキテクチャ(Swin-Unet)を提案する。
対称的なTransformer U-Netにおけるスキップ接続でマルチスケール特徴学習を可能にする。
畳み込みなしのアップサンプリングのためのパッチ拡張を導入する。
マルチオーガンCTおよび心臓MRI分割データセットで頑健性と一般化能力を示す。

提案手法

2D医用画像を重なりのない4x4パッチに分割し、トークン特徴量に埋め込む。
パッチマージを伴う階層的Swin Transformerエンコーダを用いて多スケール表現を学習。
アップサンプリングのためのパッチ拡張層を用いた対称的なSwin Transformerベースのデコーダ。
エンコーダのマルチスケール特徴とデコーダ特徴を統合するスキップ接続を組み込む。
ImageNet-pretrained weightsと標準的なSGD最適化で訓練; SynapseとACDCデータセットで評価。

実験結果

リサーチクエスチョン

RQ1純粋なTransformerベースのU-Net(Swin-Unet)がCNN成分なしで競争力のある分割性能を達成できるか。
RQ2パッチマージ/ダウンサンプリングとパッチ拡張アップサンプリングが分割精度とエッジ精度に与える影響は？
RQ3スキップ接続、入力サイズ、モデルスケールが器官間およびデータセット間の分割性能に与える影響は？
RQ4Swin-UnetはCTとMRIなど異なる医用画像モダリティおよびタスク（マルチオーガンと心臓分割）に良く一般化するか？

主な発見

Method	DSC ↑	HD ↓	Aorta	Gallbladder	Kidney(L)	Kidney(R)	Liver	Pancreas	Spleen	Stomach
V-Net	68.81	-	75.34	51.87	77.10	80.75	87.84	40.05	80.56	56.98
DARR	69.77	-	74.74	53.77	72.31	73.24	94.08	54.18	89.90	45.96
R50 U-Net	74.68	36.87	87.74	63.66	80.60	78.19	93.74	56.90	85.87	74.16
U-Net	76.85	39.70	89.07	69.72	77.77	68.60	93.43	53.98	86.67	75.58
R50 Att-UNet	75.57	36.97	55.92	63.91	79.20	72.71	93.56	49.37	87.19	74.95
Att-UNet	77.77	36.02	89.55	68.88	77.98	71.11	93.57	58.04	87.30	75.75
R50 ViT	71.29	32.87	73.73	55.13	75.80	72.20	91.51	45.99	81.99	73.95
TransUnet	77.48	31.69	87.23	63.13	81.87	77.02	94.08	55.86	85.08	75.62
SwinUnet	79.13	21.55	85.47	66.53	83.28	79.61	94.29	56.58	90.66	76.60

Swin-Unetは評価対象手法の中でSynapseデータセットで最良のDSC(79.13)とHD(21.55)を達成。
Swin-Unetはエッジ予測が強く、HDが他のベースラインと比較して改善される（例：21.55 HD 対他法）。
ACDCデータセットではRV 90.00、Myo 88.55、LV 85.62、LV 95.83のDSCを達成し、いくつかのベースラインを上回る。
アブレーション研究はパッチ拡張アップサンプリングがバイリニアおよび転置畳み込み法を上回ることを示す。
入力サイズを224から384に増やすと器官別DSCが改善されるが計算コストが上昇。Tinyを超えるモデルスケーリングは限定的な利得。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。