[论文解读] DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion
本论文提出 DDRF,一个用于 pansharpening 和 hyperspectral fusion 的有监督去噪扩散模型,使用两个条件注入模块(风格迁移和小波调制)在 PAN/LrMS 与 HRMS 的融合中实现了在多个数据集上的最先进结果。
Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in the field of image fusion. In this article, we introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translation and designing two different conditional injection modulation modules (i.e., style transfer modulation and wavelet modulation) to inject coarse-grained style information and fine-grained high-frequency and low-frequency information into the diffusion UNet, thereby generating fused images. In addition, we also discussed the residual learning and the selection of training objectives of the diffusion model in the image fusion task. Extensive experimental results based on quantitative and qualitative assessments compared with benchmarks demonstrates state-of-the-art results and good generalization performance in image fusion tasks. Finally, it is hoped that our method can inspire other works and gain insight into this field to better apply the diffusion model to image fusion tasks. Code shall be released for better reproducibility.
研究动机与目标
- Motivate the use of diffusion models for remote sensing image fusion (pansharpening and hyperspectral fusion).
- Propose DDRF with two conditional injection modules to inject coarse-grained style and fine-grained frequency information during diffusion-based fusion.
- Explore training objectives and residual learning to improve fusion quality on small remote sensing datasets.
- Reduce sampling time with fast ODE-based diffusion sampling and linear-memory cross-attention for efficiency.
- Demonstrate state-of-the-art or competitive performance on standard remote sensing fusion benchmarks.
提出的方法
- Treat image fusion as image-to-image translation within a conditional diffusion framework.
- Introduce two conditional injection modules: style transfer modulation (coarse-grained style) and wavelet modulation (high-/low-frequency details) to guide the diffusion UNet.
- Use residual learning by feeding HRMS minus MS as the diffusion model input to promote high-frequency detail recovery.
- Adopt three modeling targets for training (epsilon, x0, or v) and empirically find x0 works well on small datasets.
- Incorporate DB1 wavelet decomposition to extract low- and high-frequency PAN/LrMS components and apply linear-memory cross-attention to inject details efficiently.
- Implement fast sampling by converting SDE sampling to a faster ODE-based sampler that predicts x0 directly at selected timesteps.
- Optimize with a simple L1 loss between x0 and the model prediction conditioned on PAN and MS inputs, with cosine schedule for alpha_t and EMA for stabilization.
实验结果
研究问题
- RQ1Can diffusion models be effectively adapted for remote sensing image fusion tasks like pansharpening and hyperspectral fusion?
- RQ2Do conditional injection modules (style transfer and wavelet modulation) improve fusion quality by injecting coarse-grained style and high-/low-frequency details?
- RQ3Is residual learning beneficial for guiding diffusion-based fusion on small remote sensing datasets?
- RQ4Can fast diffusion sampling (ODE-based) and linear-memory attention maintain or improve performance while reducing computational cost?
主要发现
- DDRF achieves state-of-the-art results on pansharpening datasets (World-View3, GaoFen2, Quick-Bird) in reduced-data experiments and is competitive on full-data tests.
- Two conditional injection modules (style transfer modulation and wavelet modulation) effectively inject coarse-grained spectral-spatial style and multi-frequency details into the diffusion process.
- Residual learning with HRMS minus MS input accelerates convergence and improves high-frequency detail recovery.
- Fast ODE-based sampling with a subset of timesteps significantly speeds up generation without sacrificing fusion quality.
- On the WV3 reduced dataset, DDRF attains 2.77 SAM and 2.05 ERGAS metrics, demonstrating strong quantitative performance.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。