QUICK REVIEW

[論文レビュー] LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

Guoping Xu, Xingrong Wu|arXiv (Cornell University)|Jul 19, 2021

Advanced Neural Network Applications参考文献 39被引用数 54

ひとこと要約

LeViT-UNetはLeViTトランスフォーマーベースのエンコーダをU-Net風デコーダに埋め込み、トランスフォーマーとCNNブロックの両方からのマルチスケール特徴を統合して、高速で正確な2D医療画像分割を実現します。Synapseで競争力のある精度とエッジ予測の改善を示し、ACDCで強い一般化を示します。

ABSTRACT

Medical image segmentation plays an essential role in developing computer-assisted diagnosis and therapy systems, yet still faces many challenges. In the past few years, the popular encoder-decoder architectures based on CNNs (e.g., U-Net) have been successfully applied in the task of medical image segmentation. However, due to the locality of convolution operations, they demonstrate limitations in learning global context and long-range spatial relations. Recently, several researchers try to introduce transformers to both the encoder and decoder components with promising results, but the efficiency requires further improvement due to the high computational complexity of transformers. In this paper, we propose LeViT-UNet, which integrates a LeViT Transformer module into the U-Net architecture, for fast and accurate medical image segmentation. Specifically, we use LeViT as the encoder of the LeViT-UNet, which better trades off the accuracy and efficiency of the Transformer block. Moreover, multi-scale feature maps from transformer blocks and convolutional blocks of LeViT are passed into the decoder via skip-connection, which can effectively reuse the spatial information of the feature maps. Our experiments indicate that the proposed LeViT-UNet achieves better performance comparing to various competing methods on several challenging medical image segmentation benchmarks including Synapse and ACDC. Code and models will be publicly available at https://github.com/apple1986/LeViT_UNet.

研究の動機と目的

トランスフォーマーに基づくグローバルコンテキストとCNNのローカル機能を組み合わせて医用画像分割の改善を動機づける。
U-Net風デコーダに統合された軽量なLeViTベースのエンコーダを提案する。
トランスフォーマーと畳み込み特徴の両方を活用するためのマルチスケール特徴融合戦略を開発する。
複数の医用分割ベンチマークで精度と効率を評価する。

提案手法

エンコーダとしてLeViTを用い、計算量FLOPsを削減してグローバルコンテキストを抽出する。
エンコーダの最終段で畳み込みブロックとトランスフォーマーブロックのマルチスケール特徴を結合する。
分解能回復のためにスキップ接続を備えたCNNベースのデコーダを保持する。
LeViTのバックボーンをImageNet-1kで事前学習してパラメータを初期化する。
LeViT-UNet-128s、-192、-384の3つのバリアントを比較してチャンネル効果と性能を検討する。
トランスフォーマーの有無、スキップ接続、事前学習の影響を理解するためにアブレーションを実施する。

実験結果

リサーチクエスチョン

RQ1LeViTベースのエンコーダは、U-Netフレームワーク内でリアルタイム性に近い効率を維持しつつ分割精度を向上させられるか。
RQ2トランスフォーマーとCNN特徴のマルチスケール融合は、グローバルコンテキストと局所ディテールの両方を強化するか。
RQ3トランスフォーマーチャンネル数とスキップ接続の数は、分割性能と境界精度にどのような影響を与えるか。
RQ4LeViT-UNetは標準的な医用データセット（Synapse、ACDC）で、最先端のCNNおよびトランスフォーマーベース手法と比較してどのような性能を示すか。

主な発見

Methods	DSC ↓?	HD ↓?	Aorta	Gallbladder	Kidney(L)	Kidney(R)	Liver	Pancreas	Spleen	Stomach	# params(M)	FLOPs(G)	FPS
V-Net	68.81	-	75.34	51.87	77.10	80.75	87.84	40.05	80.56	56.98	-	-	-
DARR	69.77	-	74.74	53.77	72.31	73.24	94.08	54.18	89.90	45.96	-	-	-
U-Net	76.85	39.70	89.07	69.72	77.77	68.60	93.43	53.98	86.67	75.58	-	-	-
R50 U-Net	74.68	36.87	87.74	63.66	80.60	78.19	93.74	56.90	85.87	74.16	-	-	-
R50 Att-UNet	75.57	36.97	55.92	63.91	79.20	72.71	93.56	49.37	87.19	74.95	-	-	-
R50-Deeplabv3+	75.73	26.93	86.18	60.42	81.18	75.27	92.86	51.06	88.69	70.19	-	-	-
R50 ViT	71.29	32.87	73.73	55.13	75.80	72.20	91.51	45.99	81.99	73.95	-	-	-
TransUnet	77.48	31.69	87.23	63.13	81.87	77.02	94.08	55.86	85.08	75.62	105.28	24.64	50
SwinUnet	79.13	21.55	85.47	66.53	83.28	79.61	94.29	56.58	90.66	76.60	-	-	-
LeViT-UNet-128s	73.69	23.92	86.45	66.13	79.32	73.56	91.85	49.25	79.29	63.70	15.91	17.55	114
LeViT-UNet-192	74.67	18.86	85.69	57.37	79.08	75.90	92.05	53.53	83.11	70.61	19.90	18.92	95
LeViT-UNet-384	78.53	16.84	87.33	62.23	84.61	80.25	93.11	59.07	88.86	72.76	52.17	25.55	85

LeViT-UNet-384はSynapseでDSC 78.53%、HD 16.84 mmを達成し、境界精度の面でいくつかのSOTA手法を上回る。
SynapseではLeViT-UNetのバリアントは臓器ごとに競争力のあるDSCを達成し、報告された手法の中でHDが最も良いのはLeViT-UNet-384（16.84 mm）である。
LeViT-UNet-384はACDCのRVとLVでそれぞれDSC 90.32、DSC 93.76を達成し、心臓分割の高い性能を示す。
トランスフォーマーチャンネル数を増やしトランスフォーマーブロックを導入するほど、非トランスフォーマーベースのベースラインよりDSCとHDの改善が一貫して現れる。
スキップ接続を増やすと一般に性能が向上し、特に大動脈や胆嚢など小さな臓器で顕著な改善が見られる。
事前学習はより大きなトランスフォーマー・バックボーン（例: LeViT-UNet-384）で有用である一方、より小さなバージョンでは効果が混在する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。