QUICK REVIEW

[論文レビュー] Mixed Transformer U-Net For Medical Image Segmentation

Hongyi Wang, Shiao Xie|arXiv (Cornell University)|Nov 8, 2021

Advanced Neural Network Applications被引用数 25

ひとこと要約

この論文は、混合トランスフォーマーモジュール MTM を導入します。Local-Global Gaussian-Weighted Self-Attention と External Attention を組み合わせて、サンプル内およびサンプル間の相関をモデル化し、事前学習なしで医用画像分割の改善をもたらします。

ABSTRACT

Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous inter- and intra- affinities learning. MTM first calculates self-affinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet.

研究の動機と目的

医用画像解析におけるCNNベースのU-Netで長距離依存性のモデリング不足を解消する。
サンプル内およびサンプル間のアフィニティを効率的に扱うトランスフォーマーモジュールを提案する。
畳み込みのステムを介して構造事前情報を組み込み、事前学習の必要性を減らす。
公開データセットで事前学習なしの最先端のセグメンテーション性能を示す。

提案手法

Local-Global Gaussian-Weighted Self-Attention (LGG-SA) と External Attention (EA) を組み合わせた Mixed Transformer Module (MTM) を導入する。
LGG-SA は学習可能なガウスマスクを用いて、細粒度の文脈に対して逐次的な局所自己注意と、粗粒度の文脈に対してグローバル注意を行う。
グローバル注意計算を削減するために Axial Attention を使用し、近傍のトークンを強調する学習可能なガウス行列を導入する。
MTM では、コストを削減するためにより深く解像度の低い層で局所 SA とグローバル SA を適用し、上位層では標準的な畳み込み処理を用いる。
データセット全体で共有メモリユニットを用いてサンプル間の相関をモデル化するために External Attention を用いる。
畳み込みステムを浅層に適用して構造事前情報を注入し、事前学習なしでゼロから MT-UNet を訓練する。

実験結果

リサーチクエスチョン

RQ1MTM は医用画像分割のためにサンプル間およびサンプル内のアフィニティを効果的に学習できるか？
RQ2LGG-SA は、通常の自己注意より計算量を抑えつつ、局所/詳細とグローバル文脈の有利なバランスを提供するか？
RQ3External Attention は過度な計算量なしにサンプル間情報の利用を改善するか？
RQ4公開医用セグメンテーションデータセットにおいて、最先端の Transformers および CNN ベース手法と比較して MT-UNet の性能はどうか？
RQ5医用画像タスクで事前学習なしに高い精度を達成できるか？

主な発見

MT-UNet は2つの公開データセットで最先端の性能を達成し、いくつかのビジョントランスフォーマーおよびCNNベース手法を上回った。
アブレーションでは、局所 SA、グローバル SA、または EA の除去が性能を低下させ、ガウスマスキングが有益であることを示す。
MTM は全構成要素（LGG-SA と EA）を含む MTM は ACDC データセットで DSC = 90.43%、HD95 = 2.23 mm を得る。
Synapse では MT-UNet は DSC 78.59%、HD95 26.59 mm を達成; MTM の変種は全て MT-UNet に比べて劣る。
MT-UNet は完全な自己注意より計算量が低い O(n√n) を維持しつつ、セグメンテーション性能を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。