QUICK REVIEW

[論文レビュー] DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation

Ailiang Lin, Bingzhi Chen|arXiv (Cornell University)|Jun 12, 2021

Advanced Neural Network Applications被引用数 41

ひとこと要約

DS-TransUNetは、長距離および多スケールの文脈を医用画像分割のために捉えるために、U字型アーキテクチャ内にデュアルスケールのSwin TransformerエンコーダとTransformer Interactive Fusionモジュールを導入し、ポリープ分割、ISIC 2018、GLAS、2018 DS Bowlを含む複数データセットで最先端の結果を達成します。

ABSTRACT

Automatic medical image segmentation has made great progress benefit from the development of deep learning. However, most existing methods are based on convolutional neural networks (CNNs), which fail to build long-range dependencies and global context connections due to the limitation of receptive field in convolution operation. Inspired by the success of Transformer in modeling the long-range contextual information, some researchers have expended considerable efforts in designing the robust variants of Transformer-based U-Net. Moreover, the patch division used in vision transformers usually ignores the pixel-level intrinsic structural features inside each patch. To alleviate these problems, we propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet), which might be the first attempt to concurrently incorporate the advantages of hierarchical Swin Transformer into both encoder and decoder of the standard U-shaped architecture to enhance the semantic segmentation quality of varying medical images. Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoder subnetworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales. As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism. Furthermore, we also introduce the Swin Transformer block into decoder to further explore the long-range contextual information during the up-sampling process. Extensive experiments across four typical tasks for medical image segmentation demonstrate the effectiveness of DS-TransUNet, and show that our approach significantly outperforms the state-of-the-art methods.

研究の動機と目的

U-Netのエンコーダとデコーダの双方にTransformerベースの長距離文脈モデリングを組み込むことにより、医用画像分割の改善を動機付ける。
粗い特徴と細かい特徴の表現を抽出するデュアルスケールSwin Transformerエンコーダを提案する。
グローバルに多スケール特徴を融合するTransformer Interactive Fusion (TIF)を開発する。
長距離依存を活用したアップサンプリングを強化するため、デコーダにSwin Transformerブロックを組み込む。
4つの医用分割タスクとデータセットを横断する頑健性を実証する。

提案手法

大規模パッチと小規模パッチの両方で動作するデュアルブランチのSwin Transformerエンコーダを用い、粗い特徴と細かい特徴を取得する。
自己注意機構を介して多スケールのエンコーダ特徴を融合するTransformer Interactive Fusion (TIF)を導入する。
各デコーダ段にSwin Transformerブロックを統合し、グローバルな文脈とともに空間解像度を回復する。
多スケール訓練と中間出力に損失項を設けた深い監督を用い、収束性を向上させる。
ポリープ分割、ISIC 2018、GLAS、2018 Data Science Bowlデータセットで訓練と評価を行う。

実験結果

リサーチクエスチョン

RQ1デュアルスケールSwin Transformerエンコーダは医用画像分割における多スケール特徴学習を改善できるか。
RQ2Transformerベースの融合モジュール（TIF）は、粗い特徴と細かい特徴をスケール横断的に効果的に統合できるか。
RQ3デコーダにSwin Transformerブロックを組み込むことで、長距離依存を有するアップサンプリングが改善されるか。
RQ4DS-TransUNetは、最先端手法と比較して多様な医用分割タスクでどの程度性能を示すか。

主な発見

DS-TransUNetのバリアントは、複数のデータセットにおけるポリープ分割で従来のSOTA手法を上回る。
Kvassirポリープデータセット上で、DS-TransUNet-LはmDice 0.913、mIoU 0.859、recall 0.936、precision 0.916を達成。
ClinicDBでは、DS-TransUNet-LはF1 0.9422、mIoU 0.8939、recall 0.9500、precision 0.9369を達成。
未知データセット全般で、DS-TransUNetは強い一般化性能を示し、競合手法を顕著な差で上回る。
本アプローチは、TransFuseおよび他のベースラインに対して、複数の分割タスク（ポリープ、ISIC 2018、GLAS、DS Bowl）で一貫した改善をもたらす。
定性的結果は、エッジの輪郭抽出が改善され、難易度の高いポリープに対する頑健さが示されている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。