QUICK REVIEW

[论文解读] LFMamba: Light Field Image Super-Resolution with State Space Model

Wang Xian, Yao Lu|arXiv (Cornell University)|Jun 18, 2024

Advanced Image Fusion Techniques被引用 5

一句话总结

LFMamba 引入基于状态空间模型（SSM）的网络，处理 4D 光场的有信息性的 2D 片段，使用高效的具备 SS2D 的 S6 块，在参数减少且复杂度线性的情况下实现具有竞争力的 LFSR 性能。

ABSTRACT

Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.

研究动机与目标

通过利用长距离依赖，超越传统 CNN 和 Transformer，提升光场图像超分辨率的动机。
提出一个可扩展的基于 SSM 的框架（LFMamba），在有信息性的 2D LF 片段上工作（SAI, MacPI, EPI-H, EPI-V）。
设计一个紧凑的基本 SSM 块，配备高效的 SS2D，以学习空间、角度和结构信息。
在标准基准上展示优于或与最先进的 LFSR 方法相竞争的性能。
提供消融研究以验证所提出方法的贡献和泛化能力。

提出的方法

通过提取每个 LF 的四个有信息性的 2D 片段（SAI, MacPI, EPI-H, EPI-V）并将每个片段展平为一个 1D 序列以供 SSM 处理来建模 LF 数据。
引入一个具有高效 SS2D（ESS2D）的基本 SSM 块，以在保持性能的同时减少参数。
构建 LFMamba，包含四个模块：初始特征提取、空间-角度特征学习（SAFL）、光场结构特征学习（LSFL）和高分辨率光场重建（HLFR）。
在 SAFL 内应用交替的空间与角度 SSM 块，在 LSFL 内应用交替的水平/垂直 EPI SSM 块，以捕获空间、角度和结构信息。
通过拼接和 1x1 卷积融合多级特征，然后进行像素重排以实现 HR LF 重建。
采用带 LayerNorm、S6、Conv 和通道注意力的两阶段 SSM 块，并包括可学习的残差尺度。

实验结果

研究问题

RQ1具备选择性机制（S6）的状态空间模型是否能在光场超分辨率（LFSR）方面超过基于 CNN 与 Transformer 的方法？
RQ2将 SSM 应用于有信息性的 2D LF 片段（SAI, MacPI, EPI-H, EPI-V）是否能够有效捕获 LFSR 的空间上下文、角度和结构信息？
RQ3基于 ESS2D 的高效 S6 块在保持精度的同时是否具有参数效率？
RQ4在尺度 x2 和 x4 的标准 LF SR 基准测试中，LFMamba 的表现如何，相比最先进的方法？
RQ5LFMamba 在 LF 角度 SR 任务上的泛化能力如何？

主要发现

方法	尺度	#Param.(M)	FLOPs(G)	EPFL PSNR/SSIM	HCInew PSNR/SSIM	HCIold PSNR/SSIM	INRIA PSNR/SSIM	STFgantry PSNR/SSIM	平均 PSNR/SSIM
Bicubic	2	-	-	29.50/.9350	31.69/.9335	37.46/.9776	31.10/.9563	30.82/.9473	31.11/.9542
RCAN	2	15.3	389.75	33.16/.9635	34.98/.9602	41.05/.9875	35.01/.9769	36.33/.9825	36.11/.9742
resLF	2	7.98	37.06	32.75/.9672	36.07/.9715	42.61/.9922	34.57/.9784	36.89/.9873	36.58/.9793
LFSSR	2	0.88	25.70	33.69/.9748	36.86/.9753	43.75/.9939	35.27/.9834	38.07/.9902	37.73/.9835
LF-InterNet	2	5.04	47.46	34.14/.9761	37.28/.9769	44.45/.9945	35.80/.9846	38.72/.9916	38.08/.9847
LF-ATO	2	1.22	597.66	34.27/.9757	37.24/.9767	44.20/.9942	36.15/.9842	39.64/.9929	38.15/.9843
MEG-Net	2	1.69	48.40	34.34/.9773	37.42/.9777	44.08/.9942	36.09/.9849	38.77/.9915	38.14/.9851
LF-DFNet	2	3.94	57.22	34.44/.9766	37.44/.9786	44.23/.9943	36.36/.9841	39.61/.9935	38.41/.9854
IINet	2	4.84	56.16	34.68/.9773	37.74/.9790	44.84/.9948	36.57/.9853	39.86/.9936	38.74/.9857
LF-SAV	2	1.22	34.65	34.62/.9772	37.43/.9776	44.22/.9942	36.36/.9849	38.69/.9914	38.26/.9851
DistgSSR	2	3.53	64.11	34.81/.9787	37.96/.9796	44.94/.9949	36.59/.9859	40.40/.9942	38.94/.9866
HLFSR	2	13.72	167.40	35.31/.9800	38.32/.9807	44.98/.9950	37.06/.9867	40.85/.9947	39.30/.9874
DPT	2	3.73	65.34	34.48/.9758	37.35/.9771	44.31/.9943	36.40/.9843	39.52/.9926	38.40/.9848
LFT	2	1.11	56.16	34.80/.9781	37.84/.9791	44.52/.9945	36.59/.9855	40.51/.9941	38.85/.9863
EPIT	2	1.42	69.71	34.83/.9775	38.23/.9810	45.08/.9949	36.67/.9853	42.17/.9957	39.40/.9877
LF-DET	2	1.59	48.50	35.26/.9797	38.31/.9807	44.99/.9950	36.95/.9864	41.76/.9955	39.45/.9875
LFMamba	2	2.15	62.95	35.75/.9824	38.36/.9810	44.98/.9950	37.07/.9876	40.95/.9948	39.42/.9882
LFMamba†	2	2.15	62.95	35.84/.9832	38.59/.9816	45.20/.9952	37.19/.9880	41.15/.9950	39.59/.9886
Bicubic	4	-	-	25.14/.8311	27.61/.8507	32.42/.9335	26.82/.8860	25.93/.8431	27.58/.8701
RCAN	4	15.4	391.25	27.88/.8863	29.63/.8886	35.20/.9548	29.76/.9276	28.90/.9131	30.27/.9141
resLF	4	8.64	39.70	28.27/.9035	30.73/.9107	36.71/.9682	30.34/.9412	30.19/.9372	31.25/.9322
LFSSR	4	1.77	128.44	28.27/.9118	30.72/.9145	36.70/.9696	30.31/.9467	30.15/.9426	31.23/.9370
LF-InterNet	4	5.48	50.10	28.67/.9162	30.98/.9161	37.11/.9716	30.64/.9491	30.53/.9409	31.58/.9388
LF-ATO	4	1.36	686.99	28.52/.9115	30.88/.9135	37.00/.9699	30.71/.9484	30.61/.9430	31.54/.9373
MEG-Net	4	1.77	102.20	28.74/.9160	31.10/.9177	37.28/.9716	30.66/.9490	30.77/.9453	31.71/.9399
LF-DFNet	4	3.99	57.31	28.77/.9165	31.23/.9196	37.32/.9718	30.83/.9503	31.15/.9494	31.86/.9415
IINet	4	4.88	57.42	29.11/.9188	31.36/.9208	37.62/.9734	31.08/.9515	31.21/.9502	32.08/.9429
LF-SAV	4	1.54	115.80	29.37/.9223	31.45/.9217	37.50/.9721	31.27/.9531	31.36/.9505	32.19/.9439
DistgSSR	4	3.58	65.41	28.99/.9195	31.38/.9217	37.56/.9732	30.99/.9519	31.65/.9535	32.11/.9440
HLFSR	4	13.87	182.52	29.20/.9222	31.57/.9238	37.78/.9742	31.24/.9534	31.64/.9537	32.29/.9455
DPT	4	3.78	66.55	28.93/.9170	31.19/.9188	37.39/.9721	30.96/.9503	31.14/.9488	31.92/.9414
LFT	4	1.16	57.60	29.25/.9210	31.46/.9218	37.63/.9735	31.20/.9524	31.86/.9548	32.28/.9447
EPIT	4	1.47	74.96	29.34/.9197	31.51/.9231	37.68/.9737	31.37/.9526	32.18/.9571	32.40/.9452
LF-DET	4	1.69	51.20	29.47/.9230	31.56/.9235	37.84/.9744	31.39/.9534	32.14/.9573	32.48/.9463
LFMamba	4	2.30	66.90	29.84/.9256	31.70/.9249	37.91/.9748	31.81/.9551	31.85/.9554	32.62/.9472
LFMamba†	4	2.30	66.90	29.95/.9275	31.86/.9265	38.08/.9755	31.90/.9563	32.04/.9568	32.77/.9485

LFMamba 在五个 LF 基准上对于 2x 和 4x SR 都取得了具有竞争力的 PSNR/SSIM，在复杂的真实世界数据集 EPFL 和 INRIA 上表现出色。
在 x2 SR 下，LFMamba 在平均 PSNR 和 SSIM 上接近最佳方法（LF-DET），尽管 PSNR 略低，但 SSIM 相当。
在 x4 SR 下，LFMamba 在平均水平上比若干基线（例如 LF-DET）在 PSNR/SSIM 上有显著提升。
LFMamba 保持中等规模的模型（≈2.15M 参数）和相对合理的 FLOPs（~62.95G 对于 x2）相对于竞争方法。
几何组装变体（LFMamba †）也给出类似的定量结果，表明该方法对整合策略具有鲁棒性。
消融研究证实了 ESS2D 增强的 S6 块和 SAFL/LSFL 设计在利用空间、角度和结构信息方面的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。