QUICK REVIEW

[論文レビュー] Transformer-Unet: Raw Image Processing with Unet

Youyang Sha, Yonghong Zhang|arXiv (Cornell University)|Sep 17, 2021

Advanced Neural Network Applications参考文献 29被引用数 25

ひとこと要約

この論文は、Transformerモジュールを注入したUnet風デコーダを備えたネットワーク、TUnetを紹介します。Unet、Attention Unet、TransUnetと比較してCT82膵臓データでより優れたセグメンテーションを達成します。

ABSTRACT

Medical image segmentation have drawn massive attention as it is important in biomedical image analysis. Good segmentation results can assist doctors with their judgement and further improve patients' experience. Among many available pipelines in medical image analysis, Unet is one of the most popular neural networks as it keeps raw features by adding concatenation between encoder and decoder, which makes it still widely used in industrial field. In the mean time, as a popular model which dominates natural language process tasks, transformer is now introduced to computer vision tasks and have seen promising results in object detection, image classification and semantic segmentation tasks. Therefore, the combination of transformer and Unet is supposed to be more efficient than both methods working individually. In this article, we propose Transformer-Unet by adding transformer modules in raw images instead of feature maps in Unet and test our network in CT82 datasets for Pancreas segmentation accordingly. We form an end-to-end network and gain segmentation results better than many previous Unet based algorithms in our experiment. We demonstrate our network and show our experimental results in this paper accordingly.

研究の動機と目的

Transformer成分をUnetフレームワークに組み込んで生体医療画像のセグメンテーションを改善する動機付け。
グローバルな関係モデリングを提供するTransformerと局所的特徴抽出を行うUnetを組み合わせて、高解像CTスライスにおけるセグメンテーション精度を向上させる。

提案手法

生の画像をパッチ列として表現し、これらの列にViTライクのTransformerを適用する。
パッチを1x1畳み込みで埋め込み、学習可能な位置エンベディングを追加し、LayerNormを用いた複数の自己注意機構とMLP層を適用する。
Transformer出力をデコーダへ多段階特徴の連結によってほぼ対称なUnetエンコーダ-デコーダ構造で使用する。
Transformer出力をUnetデコーダ入力と一致するよう reshape し、連結と最終的なバイリニアアップサンプリングを原解像度へ適用する。
エンドツーエンドでBCE損失を用いてピクセル単位のセグメンテーションを訓練する。

実験結果

リサーチクエスチョン

RQ1Transformerで生のCTスライスを処理することで、特徴マップベースのTransformerアプローチと比較してセグメンテーションが改善されるか？
RQ2生の画像上に直接Transformerを統合しUnetデコーダを組み合わせることで、膵臓セグメンテーションにおいてUnet、Attention Unet、TransUnetを上回るか？
RQ3パッチサイズとUnetバックボーンの深さはTUnetの性能と効率性にどのような影響を与えるか？

主な発見

ネットワーク	mIOU	Diceスコア	ピクセル精度	適合率	再現率
Unet	0.8113	0.7689	0.9981	0.8249	0.7200
Attn-Unet	0.8172	0.7777	0.9982	0.8346	0.7280
TransUnet	0.7882	0.7330	0.9979	0.8379	0.6515
TUnet	0.8301	0.7966	0.9983	0.8278	0.7676

TUnetはCT82膵臓セグメンテーションで評価されたモデルの中で最も高いmIOUとDiceスコアを達成: mIOU 0.8301, Diceスコア 0.7966。
TUnetはUnet（mIOU 0.8113, Dice 0.7689）およびAttention Unet（mIOU 0.8172, Dice 0.7777）とTransUnet（mIOU 0.7882, Dice 0.7330）を上回る。
TUnetは良好なピクセル精度（0.9983）と再現性（0.7676）を備えた強力なセグメンテーションを提供。
TUnetのモデルサイズと推論時間はUnet/Attention Unetよりやや大きいが、それでも現実的（パラメータ約548.6MB；推論約0.041秒）。
16x16パッチで最適な結果が観測され、より大きなパッチは性能と効率を低下させる；より深いUnetバックボーンはTransformer統合を有利にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。