QUICK REVIEW

[论文解读] xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart

Tianrun Chen, Chaotao Ding|arXiv (Cornell University)|Jul 1, 2024

Brain Tumor Detection and Classification被引用 5

一句话总结

论文提出 xLSTM-UNet，一种类似 UNet 的架构，利用 xLSTM/ViL 作为骨干，在 2D 与 3D 医学图像分割任务中超越基于 CNN、Transformer 与 Mamba 的分割模型。

ABSTRACT

Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (xLSTM) as its backbone for medical image segmentation. xLSTM is a recently proposed as the successor of Long Short-Term Memory (LSTM) networks and have demonstrated superior performance compared to Transformers and State Space Models (SSMs) like Mamba in Neural Language Processing (NLP) and image classification (as demonstrated in Vision-LSTM, or ViL implementation). Here, xLSTM-UNet we designed extend the success in biomedical image segmentation domain. By integrating the local feature extraction strengths of convolutional layers with the long-range dependency capturing abilities of xLSTM, xLSTM-UNet offers a robust solution for comprehensive image analysis. We validate the efficacy of xLSTM-UNet through experiments. Our findings demonstrate that xLSTM-UNet consistently surpasses the performance of leading CNN-based, Transformer-based, and Mamba-based segmentation networks in multiple datasets in biomedical segmentation including organs in abdomen MRI, instruments in endoscopic images, and cells in microscopic images. With comprehensive experiments performed, this technical report highlights the potential of xLSTM-based architectures in advancing biomedical image analysis in both 2D and 3D. The code, models, and datasets are publicly available at http://tianrun-chen.github.io/xLSTM-UNet/

研究动机与目标

Motivate using xLSTM/ViL to address long-range dependencies in medical image segmentation.
Design a UNet-like architecture (xLSTM-UNet) that injects ViL/xLSTM blocks into encoder layers.
Demonstrate improved segmentation performance over CNN-, Transformer-, and Mamba-based baselines across diverse 2D/3D datasets.
Provide implementation and dataset details to facilitate reproducibility.

提出的方法

Adopts a UNet-like encoder–decoder structure.
Incorporates xLSTM blocks into multiple encoder layers with residual blocks and instance normalization.
Flattens and normalizes intermediate features before feeding to ViL/xLSTM blocks and concatenates the output to decoder paths.
Explores two variants: ours_bot (xLSTM in bottleneck only) and ours_enc (xLSTM in all encoder blocks).
Trains end-to-end with Dice plus cross-entropy loss, using AdamW on high-end GPUs.”] ,
research_questions1_status
research_questionsor:
research_questions: [

实验结果

研究问题

RQ1Can xLSTM-UNet surpass CNN-, Transformer-, and Mamba-based segmentation networks in 2D and 3D medical image segmentation?
RQ2Do encoder-wide xLSTM insertions (ours_enc) provide more benefit than bottleneck-only usage (ours_bot)?
RQ3Is xLSTM-UNet robust across diverse modalities including abdomen MRI, endoscopy, microscopy, and brain MRI?
RQ4How does xLSTM-UNet scale between 2D and 3D segmentation tasks?

主要发现

xLSTM-UNet achieves state-of-the-art results on Abdomen MRI 2D, Endoscopy, and Microscopy datasets, with ours_enc reaching DSC 0.7747 and NSD 0.8374 on Abdomen MRI 2D.
Both xLSTM variants (ours_bot and ours_enc) outperform U-Mamba variants and other baselines across 2D tasks.
On the Endoscopy dataset, both xLSTM-UNet variants achieve the best DSC and NSD scores (0.6843 and 0.7001, respectively).
On the Microscopy dataset, xLSTM-UNet variants attain F1 scores of 0.6036 (ours_enc) and 0.5818 (ours_bot), surpassing prior SOTA.
In 3D BraTS2023, xLSTM-UNet variants achieve the highest average Dice (91.80) compared to other methods.
On Abdomen MRI 3D, xLSTM-UNet_bot achieves DSC 0.8483 and NSD 0.9153, outperforming baselines.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。

[论文解读] xLSTM-UNet can be an Effective 2D &amp; 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart