QUICK REVIEW

[論文レビュー] MISSFormer: An Effective Medical Image Segmentation Transformer

Xiaohong Huang, Zhifang Deng|arXiv (Cornell University)|Sep 15, 2021

Advanced Neural Network Applications参考文献 40被引用数 170

ひとこと要約

MISSFormer は Enhanced Transformer Block と Enhanced Transformer Context Bridge を備えた階層型 U 形トランスフォーマをゼロから訓練し、Synapse および ACDC データセットで最先端の医用画像分割を達成する。

ABSTRACT

The CNN-based methods have achieved impressive results in medical image segmentation, but they failed to capture the long-range dependencies due to the inherent locality of the convolution operation. Transformer-based methods are recently popular in vision tasks because of their capacity for long-range dependencies and promising performance. However, it lacks in modeling local context. In this paper, taking medical image segmentation as an example, we present MISSFormer, an effective and powerful Medical Image Segmentation tranSFormer. MISSFormer is a hierarchical encoder-decoder network with two appealing designs: 1) A feed-forward network is redesigned with the proposed Enhanced Transformer Block, which enhances the long-range dependencies and supplements the local context, making the feature more discriminative. 2) We proposed Enhanced Transformer Context Bridge, different from previous methods of modeling only global information, the proposed context bridge with the enhanced transformer block extracts the long-range dependencies and local context of multi-scale features generated by our hierarchical transformer encoder. Driven by these two designs, the MISSFormer shows a solid capacity to capture more discriminative dependencies and context in medical image segmentation. The experiments on multi-organ and cardiac segmentation tasks demonstrate the superiority, effectiveness and robustness of our MISSFormer, the experimental results of MISSFormer trained from scratch even outperform state-of-the-art methods pre-trained on ImageNet. The core designs can be generalized to other visual segmentation tasks. The code has been released on Github: https://github.com/ZhifangDeng/MISSFormer

研究の動機と目的

医用画像分割における長距離依存性をモデル化するCNNの制限に対処する。
正確なセグメンテーションのための位置情報を持たない階層型U字トランスフォーマを提案する。
局所およびグローバル文脈を捉えるEnhanced Transformer BlockとEnhanced Transformer Context Bridgeを設計する。
ゼロから医療データセットで訓練し、標準的なデータ拡張を用い、SGD最適化子と多項式学習率ポリシーを使用する。

提案手法

特徴表現の識別性と局所/グローバル文脈統合を改善する Enhanced Mix-FFN と呼ぶ再設計されたフィードフォワードネットワーク。
LayerNorm、Efficient Self-Attention、Enhanced Mix-FFNを組み合わせた Enhanced Transformer Block は、長距離情報と局所情報を高い計算負荷を抑えつつモデル化する。
Flatten した多レベルのトークンを連結し、Enhanced Transformer Block を通して処理することで多スケール特徴を融合する Enhanced Transformer Context Bridge。
4x4パッチの重複を持つ階層型エンコーダ-デコーダ、パッチのマージ/拡張、スキップ接続を備えU字型アーキテクチャを形成。
SGDオプティマイザと多項式学習率ポリシーを用い、標準的なデータ拡張でゼロから医用データセットで訓練。

実験結果

リサーチクエスチョン

RQ1MISSFormer は Synapse および ACDC データセット上でゼロから訓練された状態で最先端の医用画像分割手法を上回ることができるか？
RQ2Enhanced Transformer Block と Context Bridge は、従来の Transformer/MLP ベースの手法より識別性と文脈モデリングを改善するか？
RQ3Enhanced Transformer Context Bridge によるマルチスケール特徴融合は、分割精度とエッジ境界の識別性にどのように影響するか？
RQ4Enhanced Mix-FFN の異なるスキップ接続と再帰的ステップは、収束と性能にどのような影響を及ぼすか？

主な発見

Architecture	DSC ↑	HD ↓	Aorta	Gallbladder	Kidney(L)	Kidney(R)	Liver	Pancreas	Spleen	Stomach
V-Net	68.81	-	75.34	51.87	77.10	80.75	87.84	40.05	80.56	56.98
DARR	69.77	-	74.74	53.77	72.31	73.24	94.08	54.18	89.90	45.96
R50 U-Net	74.68	36.87	87.74	63.66	80.60	78.19	93.74	56.90	85.87	74.16
U-Net	76.85	39.70	89.07	69.72	77.77	68.60	93.43	53.98	86.67	75.58
R50 Att-Unet	75.57	36.97	55.92	63.91	79.20	72.71	93.56	49.37	87.19	74.95
Att-UNet	77.77	36.02	89.55	68.88	77.98	71.11	93.57	58.04	87.30	75.75
R50 ViT	71.29	32.87	73.73	55.13	75.80	72.20	91.51	45.99	81.99	73.95
TranUnet	77.48	31.69	87.23	63.13	81.87	77.02	94.08	55.86	85.08	75.62
Swin-Unet	79.13	21.55	85.47	66.53	83.28	79.61	94.29	56.58	90.66	76.60
MISSFormer_S	80.74	19.65	85.31	66.47	83.37	81.65	94.52	63.49	91.51	79.63
MISSFormer	81.96	18.20	86.99	68.65	85.21	82.00	94.41	65.67	91.92	80.81

MISSFormer は Synapse および ACDC データセットで最先端の性能を達成し、ImageNet 事前学習済みの手法を上回ることが多い。
Simple Enhanced Mix-FFN と再帰的なスキップ接続は、ベースライン SegFormer 系の変種と比べて訓練の安定性と分割精度を向上させる。
アブレーションでは、Enhanced Transformer Context Bridge とマルチスケール融合を備えた MISSFormer は Dice-Sørensen Coefficient (DSC) の顕著な向上とエッジ境界の改善を示した。
MISSFormer_S（マルチスケールブリッジなし）は MISSFormer よりも劣っており、マルチスケール情報統合の利点が示唆される。
MISSFormer は難例において堅牢なエッジ予測と強力なパフォーマンスを示し、Synapse での組織ごとの結果は競合的または優れており、ACDC での全体的な頑健性も示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。