QUICK REVIEW

[論文レビュー] UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

Abdelrahman Shaker, Muhammad Maaz|arXiv (Cornell University)|Dec 8, 2022

Advanced Neural Network Applications被引用数 52

ひとこと要約

UNETR++ は、階層的な3Dセグメンテーションネットワークにおいて空間特徴とチャネル特徴を共同でモデル化する Efficient Paired-Attention (EPA) ブロックを導入し、パラメータ数と FLOPs を大幅に抑えつつ最先端の精度を達成します。

ABSTRACT

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medical imaging, where the inputs are 3D with numerous slices. In this paper, we propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features using a pair of inter-dependent branches based on spatial and channel attention. Our spatial attention formulation is efficient having linear complexity with respect to the input sequence length. To enable communication between spatial and channel-focused branches, we share the weights of query and key mapping functions that provide a complimentary benefit (paired attention), while also reducing the overall network parameters. Our extensive evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy. On Synapse, our UNETR++ sets a new state-of-the-art with a Dice Score of 87.2%, while being significantly efficient with a reduction of over 71% in terms of both parameters and FLOPs, compared to the best method in the literature. Code: https://github.com/Amshaker/unetr_plus_plus.

研究の動機と目的

3D 医用画像におけるセグメンテーション精度とモデル効率のトレードオフを動機づけ、解決する。
UNETR を基盤としたパラメータと計算効率に優れた統一ハイブリッドアーキテクチャを提案する。
豊富な空間・チャネル依存関係を捉える Efficient Paired-Attention (EPA) ブロックを導入する。
精度の向上と効率性の改善を示すため、複数のベンチマークで UNETR++ を評価する。

提案手法

4つのエンコーダ/デコーダ段を持つ UNETR に基づく階層型エンコーダ–デコーダアーキテクチャを導入する。
Q/K 重みを共有するが別々の V パスを持つ、空間とチャネルの2つの並列アテンションモジュールを備えた Efficient Paired-Attention (EPA) ブロックを開発する。
空間アテンションを低次元空間で動作させ、入力トークン数に対して線形の計算量を達成する。
空間ブランチとチャネルブランチ間で Q/K 重みを共有し、パラメータを削減して補完的な特徴学習を実現する。
最終的なボクセル単位の予測の前に、EPA 出力を1x1x1および3x3x3畳み込みで融合する。
セグメンテーション品質を最適化するために、ソフトDiceとクロスエントロピー損失を組み合わせて訓練する。

実験結果

リサーチクエスチョン

RQ1Efficient Paired-Attention (EPA) ブロックは、計算量を削減しつつセグメンテーションの精度を維持または向上させることができるだろうか？
RQ2エンコーダとデコーダの両方に EPA を組み込んだ階層型 UNETR++ アーキテクチャは、多様なベンチマークで最先端の3D医用セグメンテーション手法を上回るだろうか？
RQ3複数のデータセット（Synapse、BTCV、ACDC、BRaTs、Decathlon-Lung）におけるセグメンテーション精度（DSC）と効率性（パラメータ、FLOPs）の観点で、UNETR++ はどのように性能を示すか？

主な発見

Synapse では、UNETR++ はベースラインの UNETR と比較してパラメータ数 42.96M、FLOPs 47.98G の大幅な削減を伴い、Dice Score が 87.22% を達成。
エンコーダのみに EPA を統合すると 85.17% DSC、デコーダにも EPA を追加すると 87.22% DSC となり、ベースラインより約54% fewer パラメータ、約37% fewer FLOPs に。
UNETR++ は Synapse で nnFormer を上回りつつ、70%以上少ないパラメータとFLOPs を使用しており、精度と効率のバランスが優れていることを示している。
BTCV では mean DSC が 83.28%、31.0 GFLOPs を達成し、nnUNet の 83.16% mean DSC だが 358 GFLOPs に比べて有利な比較を示す。
ACDC では mean DSC が 82.83%（nnFormer 92.06%、UNETR 86.61% に対して）、より高い効率性とともに堅実な性能を示す。
BRaTs および Lungs データセットでは、最近のトランスフォーマーベース手法と比較して、セグメンテーション性能と効率のトレードオフが有利である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。