[论文解读] Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement
RetinexMamba 引入一个受 Retinex 启发的照明估计器和一个以 SS2D/Mamba 为 backbone 的照明融合状态空间模型,以提升低光图像,替换 IG-MSA 为融合注意力以提高可解释性和效率,并在 LOL 数据集上实现了最先进的结果。
In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's perception of brightness and color, decompose images into illumination and reflection components but struggle with noise management and detail preservation under low light conditions. Retinexformer enhances illumination estimation through traditional self-attention mechanisms, but faces challenges with insufficient interpretability and suboptimal enhancement effects. To overcome these limitations, this paper introduces the RetinexMamba architecture. RetinexMamba not only captures the physical intuitiveness of traditional Retinex methods but also integrates the deep learning framework of Retinexformer, leveraging the computational efficiency of State Space Models (SSMs) to enhance processing speed. This architecture features innovative illumination estimators and damage restorer mechanisms that maintain image quality during enhancement. Moreover, RetinexMamba replaces the IG-MSA (Illumination-Guided Multi-Head Attention) in Retinexformer with a Fused-Attention mechanism, improving the model's interpretability. Experimental evaluations on the LOL dataset show that RetinexMamba outperforms existing deep learning approaches based on Retinex theory in both quantitative and qualitative metrics, confirming its effectiveness and superiority in enhancing low-light images.
研究动机与目标
- Motivate and address limitations of traditional Retinex and Retinexformer approaches in low-light enhancement.
- Propose a RetinexMamba architecture combining an Illumination Estimator with a Damage Restorer based on Illumination Fusion State Space Models.
- Improve interpretability and processing speed by using Fused-Attention and SS2D backbones.
- Demonstrate superior quantitative and qualitative performance on LOL datasets.
- Analyze ablations to justify architectural choices and components.
提出的方法
- Introduce Illumination Estimator (IE) that fuses the image with an illumination prior to generate an illuminated image and an illumination feature map.
- Develop Illumination Fusion State Space Model (IFSSM) as the core of the Damage Restorer, comprising Illumination Fusion Attention (IFA), 2D Selective Scan (SS2D), LN, FFN, and convolutional layers.
- Replace IG-MSA with a Cross-Attention based Fused-Attention to improve interpretability and focus attention on low-light regions.
- Utilize 2D Selective Scan (SS2D) to achieve linear computational complexity while modeling long-range dependencies.
- Adopt a Retinex-based perturbation framework, modeling I = (R + ~R) ∘ (L + ~L) and deriving illuminated image I_lu = I ∘ L̄.
- Train and evaluate on the LOL v1/v2 datasets, with MAE loss and cosine annealing, comparing PSNR/SSIM/RMSE metrics.
- Provide ablation studies (FixedHS, NoFB, NoSS2D, IG-MSA) to justify design choices.
实验结果
研究问题
- RQ1How can a Retinex-inspired architecture be combined with state-space modeling to enhance low-light images efficiently?
- RQ2Can replacing IG-MSA with a Fused-Attention and using SS2D improve interpretability and performance on LOL datasets?
- RQ3What is the impact of illumination prior fusion and SS2D depth on restoration quality and artifact suppression?
主要发现
| Methods | LOL-v1 PSNR | LOL-v1 SSIM | LOL-v1 RMSE | LOL-v2-real PSNR | LOL-v2-real SSIM | LOL-v2-real RMSE |
|---|---|---|---|---|---|---|
| LIME [14] | 16.362 | 0.624 | 21.07 | 16.342 | 0.653 | 22.54 |
| MBLLEN [27] | 17.938 | 0.699 | 18.78 | 15.950 | 0.701 | 30.22 |
| Retntinex-Net [45] | 17.188 | 0.589 | 22.59 | 16.410 | 0.640 | 20.21 |
| KinD [56] | 20.347 | 0.813 | 14.30 | 18.070 | 0.781 | 18.04 |
| KinD++ [55] | 20.707 | 0.791 | 14.34 | 16.800 | 0.741 | 15.64 |
| MIRNet [51] | 24.140 | 0.842 | 12.03 | 20.357 | 0.782 | 14.21 |
| URetntinex-Net [46] | 21.450 | 0.795 | 13.55 | 21.554 | 0.801 | 14.23 |
| Retinexformer [2] | 23.932 | 0.831 | 8.35 | 21.230 | 0.838 | 9.92 |
| RetinexMamba | 24.025 | 0.827 | 8.17 | 22.453 | 0.844 | 9.38 |
- RetinexMamba achieves higher PSNR than several SOTA methods on LOL-v1 and LOLv2-real (e.g., 24.025 PSNR on LOL-v1 and 22.453 on LOLv2-real).
- On LOL-v1, RetinexMamba attains 0.827 SSIM and 8.17 RMSE, while on LOL-v2-real it achieves 0.844 SSIM and 9.38 RMSE.
- RetinexMamba outperforms Retinexformer in PSNR on LOL-v2-real (22.453 vs 21.230) but has slightly higher RMSE in some cases, highlighting trade-offs between metrics.
- Ablation studies show the full RetinexMamba (with SS2D and fused-attention) yields the best PSNR/SSIM across LOL-v1, LOLv2-real, and LOLv2-syn.
- Qualitative results indicate RetinexMamba better controls exposure, reduces color distortion, and minimizes noise compared to baseline methods.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。