QUICK REVIEW

[論文レビュー] Dilated-UNet: A Fast and Accurate Medical Image Segmentation Approach using a Dilated Transformer and U-Net Architecture

Davoud Saadati, Omid Nejati Manzari|arXiv (Cornell University)|Apr 22, 2023

Advanced Neural Network Applications被引用数 17

ひとこと要約

Dilated-UNet は、Dilated Transformer ブロックと U-Net スタイルのエンコーダ–デコーダを組み合わせ、迅速で高精度な2D医用画像セグメンテーションを実現します。SynapseおよびISICデータセットでいくつかの最先端モデルを上回ります。

ABSTRACT

Medical image segmentation is crucial for the development of computer-aided diagnostic and therapeutic systems, but still faces numerous difficulties. In recent years, the commonly used encoder-decoder architecture based on CNNs has been applied effectively in medical image segmentation, but has limitations in terms of learning global context and spatial relationships. Some researchers have attempted to incorporate transformers into both the decoder and encoder components, with promising results, but this approach still requires further improvement due to its high computational complexity. This paper introduces Dilated-UNet, which combines a Dilated Transformer block with the U-Net architecture for accurate and fast medical image segmentation. Image patches are transformed into tokens and fed into the U-shaped encoder-decoder architecture, with skip-connections for local-global semantic feature learning. The encoder uses a hierarchical Dilated Transformer with a combination of Neighborhood Attention and Dilated Neighborhood Attention Transformer to extract local and sparse global attention. The results of our experiments show that Dilated-UNet outperforms other models on several challenging medical image segmentation datasets, such as ISIC and Synapse.

研究の動機と目的

CNNベースのエンコーダをグローバルコンテキスト学習で充実させることにより、医用画像セグメンテーションの改善を動機づける。
Neighborhood Attention による局所的ディテールの維持と Dilated Neighborhood Attention によるグローバルな文脈を保持する、Dilated Transformer ベースのエンコーダ–デコーダアーキテクチャを提案する。
従来の畳み込みを用いずに解像度を復元する、スキップ接続を備えたU-Net様の構造と、パッチ展開デコーダを設計する。
ISIC の皮膚病変データセットと Synapse の多臓器セグメンテーションデータセットでの頑健性と一般化を示す。

提案手法

Dilated-UNet アーキテクチャを、Dilated Transformer ブロックを使用し、エンコーダ、ボトルネック、デコーダ、スキップ接続の4部構成のU型設計として導入する。
連続する Transformer ブロック内で DiNA（dilated neighborhood attention）と NA（neighborhood attention）を用い、疎なグローバルおよび局所の関係を捉える。
パッチマージによりトークン数をダウンサンプリングしつつ特徴チャネルを倍増させることで、エンコーダの特徴次元と解像度を維持する。パッチ拡張デコーダは線形層でアップサンプルして情報を保持する。
エンコーダとデコーダの特徴を連結し、線形層で次元を揃えて一貫性を維持するスキップ接続モジュールを採用する。
深い Transformer によく現れる訓練収束問題を緩和するため、解像度とチャネルを変更せずに2ブロックのボトルネックを採用する。
224x224 および 384x384 の入力で評価し、精度と計算量のトレードオフを示し、スキップ接続と入力サイズのアブレーションを実施する。

実験結果

リサーチクエスチョン

RQ1Dilated Transformer ブロックを NA および DiNA 機構と統合することは、医用画像セグメンテーションにおける局所のディテールとグローバルな文脈にどのような影響を与えるか。
RQ2さまざまなサイズの臓器に対するセグメンテーション精度に対する、複数のスキップ接続の影響は何か。
RQ3パッチ展開デコーダは、TransformerベースのU-Netにおいて伝統的なアップサンプリング/畳み込みを効果的に置き換えて空間解像度を復元するか。
RQ4ISIC と Synapse データセットにおける入力解像度とモデル規模が性能と効率に与える影響は何か。

主な発見

手法	DSC ↑	HD ↓	大動脈	胆嚢	腎臓(左)	腎臓(右)	肝臓	膵臓	脾臓	胃
Dilated-Unet	82.43	17.46	89.16	72.30	86.08	81.40	94.98	65.12	91.94	81.19

Synapse の多臓器セグメンテーションで、Dilated-Unet は平均 DSC 82.43、HD 17.46 を達成し、いくつかの CNN ベース、Transformer ベース、ハイブリッドベースのモデルを上回る。
ISIC2018 皮膚病変セグメンテーションでは、Dilated-Unet は 0.9147 DSC（表には 0.9129 HD ベースの指標）を達成し、SOTA 手法である Swin-Unet や TransNorm などを上回る。
スキップ接続の数を0から3に増やすと、臓器別の DSC が大幅に向上し HD が低減され、3つのスキップで全体の最良結果を得られる。
より大きい入力サイズ（384x384）は DSC と HD を 224x224 よりも改善するが、計算コストが増大するため、効率の点では実用的には 224x224 が推奨される。
モデルサイズのアブレーションにより、Dilated-Unet-tiny/small は、競合手法のいくつかよりもはるかに少ないパラメータで高い性能を発揮することが示される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。