Skip to main content
QUICK REVIEW

[论文解读] LFMamba: Light Field Image Super-Resolution with State Space Model

Wang Xian, Yao Lu|arXiv (Cornell University)|Jun 18, 2024
Advanced Image Fusion Techniques被引用 5
一句话总结

LFMamba 引入基于状态空间模型(SSM)的网络,处理 4D 光场的有信息性的 2D 片段,使用高效的具备 SS2D 的 S6 块,在参数减少且复杂度线性的情况下实现具有竞争力的 LFSR 性能。

ABSTRACT

Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.

研究动机与目标

  • 通过利用长距离依赖,超越传统 CNN 和 Transformer,提升光场图像超分辨率的动机。
  • 提出一个可扩展的基于 SSM 的框架(LFMamba),在有信息性的 2D LF 片段上工作(SAI, MacPI, EPI-H, EPI-V)。
  • 设计一个紧凑的基本 SSM 块,配备高效的 SS2D,以学习空间、角度和结构信息。
  • 在标准基准上展示优于或与最先进的 LFSR 方法相竞争的性能。
  • 提供消融研究以验证所提出方法的贡献和泛化能力。

提出的方法

  • 通过提取每个 LF 的四个有信息性的 2D 片段(SAI, MacPI, EPI-H, EPI-V)并将每个片段展平为一个 1D 序列以供 SSM 处理来建模 LF 数据。
  • 引入一个具有高效 SS2D(ESS2D)的基本 SSM 块,以在保持性能的同时减少参数。
  • 构建 LFMamba,包含四个模块:初始特征提取、空间-角度特征学习(SAFL)、光场结构特征学习(LSFL)和高分辨率光场重建(HLFR)。
  • 在 SAFL 内应用交替的空间与角度 SSM 块,在 LSFL 内应用交替的水平/垂直 EPI SSM 块,以捕获空间、角度和结构信息。
  • 通过拼接和 1x1 卷积融合多级特征,然后进行像素重排以实现 HR LF 重建。
  • 采用带 LayerNorm、S6、Conv 和通道注意力的两阶段 SSM 块,并包括可学习的残差尺度。

实验结果

研究问题

  • RQ1具备选择性机制(S6)的状态空间模型是否能在光场超分辨率(LFSR)方面超过基于 CNN 与 Transformer 的方法?
  • RQ2将 SSM 应用于有信息性的 2D LF 片段(SAI, MacPI, EPI-H, EPI-V)是否能够有效捕获 LFSR 的空间上下文、角度和结构信息?
  • RQ3基于 ESS2D 的高效 S6 块在保持精度的同时是否具有参数效率?
  • RQ4在尺度 x2 和 x4 的标准 LF SR 基准测试中,LFMamba 的表现如何,相比最先进的方法?
  • RQ5LFMamba 在 LF 角度 SR 任务上的泛化能力如何?

主要发现

方法尺度#Param.(M)FLOPs(G)EPFL PSNR/SSIMHCInew PSNR/SSIMHCIold PSNR/SSIMINRIA PSNR/SSIMSTFgantry PSNR/SSIM平均 PSNR/SSIM
Bicubic2--29.50/.935031.69/.933537.46/.977631.10/.956330.82/.947331.11/.9542
RCAN215.3389.7533.16/.963534.98/.960241.05/.987535.01/.976936.33/.982536.11/.9742
resLF27.9837.0632.75/.967236.07/.971542.61/.992234.57/.978436.89/.987336.58/.9793
LFSSR20.8825.7033.69/.974836.86/.975343.75/.993935.27/.983438.07/.990237.73/.9835
LF-InterNet25.0447.4634.14/.976137.28/.976944.45/.994535.80/.984638.72/.991638.08/.9847
LF-ATO21.22597.6634.27/.975737.24/.976744.20/.994236.15/.984239.64/.992938.15/.9843
MEG-Net21.6948.4034.34/.977337.42/.977744.08/.994236.09/.984938.77/.991538.14/.9851
LF-DFNet23.9457.2234.44/.976637.44/.978644.23/.994336.36/.984139.61/.993538.41/.9854
IINet24.8456.1634.68/.977337.74/.979044.84/.994836.57/.985339.86/.993638.74/.9857
LF-SAV21.2234.6534.62/.977237.43/.977644.22/.994236.36/.984938.69/.991438.26/.9851
DistgSSR23.5364.1134.81/.978737.96/.979644.94/.994936.59/.985940.40/.994238.94/.9866
HLFSR213.72167.4035.31/.980038.32/.980744.98/.995037.06/.986740.85/.994739.30/.9874
DPT23.7365.3434.48/.975837.35/.977144.31/.994336.40/.984339.52/.992638.40/.9848
LFT21.1156.1634.80/.978137.84/.979144.52/.994536.59/.985540.51/.994138.85/.9863
EPIT21.4269.7134.83/.977538.23/.981045.08/.994936.67/.985342.17/.995739.40/.9877
LF-DET21.5948.5035.26/.979738.31/.980744.99/.995036.95/.986441.76/.995539.45/.9875
LFMamba22.1562.9535.75/.982438.36/.981044.98/.995037.07/.987640.95/.994839.42/.9882
LFMamba†22.1562.9535.84/.983238.59/.981645.20/.995237.19/.988041.15/.995039.59/.9886
Bicubic4--25.14/.831127.61/.850732.42/.933526.82/.886025.93/.843127.58/.8701
RCAN415.4391.2527.88/.886329.63/.888635.20/.954829.76/.927628.90/.913130.27/.9141
resLF48.6439.7028.27/.903530.73/.910736.71/.968230.34/.941230.19/.937231.25/.9322
LFSSR41.77128.4428.27/.911830.72/.914536.70/.969630.31/.946730.15/.942631.23/.9370
LF-InterNet45.4850.1028.67/.916230.98/.916137.11/.971630.64/.949130.53/.940931.58/.9388
LF-ATO41.36686.9928.52/.911530.88/.913537.00/.969930.71/.948430.61/.943031.54/.9373
MEG-Net41.77102.2028.74/.916031.10/.917737.28/.971630.66/.949030.77/.945331.71/.9399
LF-DFNet43.9957.3128.77/.916531.23/.919637.32/.971830.83/.950331.15/.949431.86/.9415
IINet44.8857.4229.11/.918831.36/.920837.62/.973431.08/.951531.21/.950232.08/.9429
LF-SAV41.54115.8029.37/.922331.45/.921737.50/.972131.27/.953131.36/.950532.19/.9439
DistgSSR43.5865.4128.99/.919531.38/.921737.56/.973230.99/.951931.65/.953532.11/.9440
HLFSR413.87182.5229.20/.922231.57/.923837.78/.974231.24/.953431.64/.953732.29/.9455
DPT43.7866.5528.93/.917031.19/.918837.39/.972130.96/.950331.14/.948831.92/.9414
LFT41.1657.6029.25/.921031.46/.921837.63/.973531.20/.952431.86/.954832.28/.9447
EPIT41.4774.9629.34/.919731.51/.923137.68/.973731.37/.952632.18/.957132.40/.9452
LF-DET41.6951.2029.47/.923031.56/.923537.84/.974431.39/.953432.14/.957332.48/.9463
LFMamba42.3066.9029.84/.925631.70/.924937.91/.974831.81/.955131.85/.955432.62/.9472
LFMamba†42.3066.9029.95/.927531.86/.926538.08/.975531.90/.956332.04/.956832.77/.9485
  • LFMamba 在五个 LF 基准上对于 2x 和 4x SR 都取得了具有竞争力的 PSNR/SSIM,在复杂的真实世界数据集 EPFL 和 INRIA 上表现出色。
  • 在 x2 SR 下,LFMamba 在平均 PSNR 和 SSIM 上接近最佳方法(LF-DET),尽管 PSNR 略低,但 SSIM 相当。
  • 在 x4 SR 下,LFMamba 在平均水平上比若干基线(例如 LF-DET)在 PSNR/SSIM 上有显著提升。
  • LFMamba 保持中等规模的模型(≈2.15M 参数)和相对合理的 FLOPs(~62.95G 对于 x2)相对于竞争方法。
  • 几何组装变体(LFMamba †)也给出类似的定量结果,表明该方法对整合策略具有鲁棒性。
  • 消融研究证实了 ESS2D 增强的 S6 块和 SAFL/LSFL 设计在利用空间、角度和结构信息方面的有效性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。