QUICK REVIEW

[論文レビュー] UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer

Haonan Wang, Peng Cao|arXiv (Cornell University)|Sep 9, 2021

Advanced Neural Network Applications被引用数 44

ひとこと要約

UCTransNetはプレーン U-Net のスキップ接続を Channel Transformer (CTrans) に置き換え、マルチスケールエンコーダ特徴を融合させ（CCT）、デコーダ特徴と整列させることで、データセットを超えた医用画像分割を改善します。

ABSTRACT

Most recent semantic segmentation methods adopt a U-Net framework with an encoder-decoder architecture. It is still challenging for U-Net with a simple skip connection scheme to model the global multi-scale context: 1) Not each skip connection setting is effective due to the issue of incompatible feature sets of encoder and decoder stage, even some skip connection negatively influence the segmentation performance; 2) The original U-Net is worse than the one without any skip connection on some datasets. Based on our findings, we propose a new segmentation framework, named UCTransNet (with a proposed CTrans module in U-Net), from the channel perspective with attention mechanism. Specifically, the CTrans module is an alternate of the U-Net skip connections, which consists of a sub-module to conduct the multi-scale Channel Cross fusion with Transformer (named CCT) and a sub-module Channel-wise Cross-Attention (named CCA) to guide the fused multi-scale channel-wise information to effectively connect to the decoder features for eliminating the ambiguity. Hence, the proposed connection consisting of the CCT and CCA is able to replace the original skip connection to solve the semantic gaps for an accurate automatic medical image segmentation. The experimental results suggest that our UCTransNet produces more precise segmentation performance and achieves consistent improvements over the state-of-the-art for semantic segmentation across different datasets and conventional architectures involving transformer or U-shaped framework. Code: https://github.com/McGregorWwww/UCTransNet.

研究の動機と目的

U-Netにおける単純なスキップ接続は普遍的に有益でないこと、そして一部のデータセットで性能を低下させる可能性があることを示す。
CCT + CCA からなるチャネル単位の Transformer ベースのスキップ機構を提案し、マルチスケールのエンコーダ特徴とデコーダ特徴をより良く融合させる。
チャネル単位の融合がセマンティックおよび解像度のギャップを低減し、データセット全体で分割精度を向上させることを示す。
GlaS、MoNuSeg、Synapse データセット上で UCTransNet を評価し、強力な U-Net および Transformer ベースのベースラインと比較する。

提案手法

標準のスキップ接続を、マルチスケールエンコーダ特徴の融合のための CCT（Channel-wise Cross Fusion Transformer）と、デコーダ特徴の融合のための CCA（Channel-wise Cross Attention）を含む Channel Transformer (CTrans) に置き換え、Transformer 出力とともに使用する。
CCT は four skip layers をパッチ化してトークン化し、結合されたキー/値でマルチヘッドのチャネル間クロスアテンションを実行し、L 層にわたる残差接続を持つ MLP を適用して、マルチスケールのエンコーダ特徴を融合する。
CCA は O_i および D_i をプーリングしてチャネル注意マップを計算し、アップサンプリングされたデコーダ特徴と連結する前に O_i を再調整する。

実験結果

リサーチクエスチョン

RQ1医用画像分割において、チャネル単位の Transformer ベースのスキップ接続は従来のスキップ接続を上回ることができるか。
RQ2マルチスケールのチャネル単位融合（CCT）は、デコーダー認識融合（CCA）とどのように相互作用して、エンコーダとデコーダの間のセマンティックおよび解像度のギャップを埋めるか。
RQ3UCTransNet とその CTrans モジュールは、最先端のベースラインと比較して、複数の医用画像データセットとアーキテクチャで一貫した改善をもたらすか。

主な発見

手法	GlaS Dice (%)	GlaS IoU (%)	MoNuSeg Dice (%)	MoNuSeg IoU (%)	Synapse Dice (%)	Synapse HD (mm)
U-Net	85.45	74.78	76.45	62.86	-	-
UNet++	87.56	79.13	77.01	63.04	-	-
AttUNet	88.80	80.69	76.67	63.47	-	-
MRUNet	88.73	80.89	78.22	64.83	-	-
TransUNet	88.40	80.40	78.53	65.05	-	-
Swin-Unet	89.58	82.06	77.69	63.77	-	-
Ours (UCTransNet w/o CCA)	78.99	30.29	78.23	26.75	-	-
Ours (UCTransNet)	90.18	82.96	79.08	65.50	-	-

UCTransNet は GlaS および MoNuSeg でベースラインを顕著に上回り（Dice と IoU の向上）、Synapse でも Dice と Hausdorff 距離の改善が報告されている。
アブレーションでは Baseline+CCT+CCA がデータセット全体で Baseline、Baseline+CCT、Baseline+CCA を一貫して上回る。
CCT のスキップスケール入力数を増やすと性能が向上し、マルチスケール特徴の融合の価値を示している。
クロスアテンションの視覚化は、どのエンコーダーレベルが分割に最も寄与するかを示し、スキップ接続分析結果と一致している。
UCTransNet の事前学習は、MoNuSeg および Synapse で収束速度と最終性能をさらに向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。