[论文解读] UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution
UltraSR 通过引入周期性空间编码与深度坐标融合来增强隐式图像函数,从而在 DIV2K 及其他基准上实现最先进的任意尺度超分,同时减少结构失真并提升高频细节。
The recent success of NeRF and other related implicit neural representation methods has opened a new path for continuous image representation, where pixel values no longer need to be looked up from stored discrete 2D arrays but can be inferred from neural network models on a continuous spatial domain. Although the recent work LIIF has demonstrated that such novel approaches can achieve good performance on the arbitrary-scale super-resolution task, their upscaled images frequently show structural distortion due to the inaccurate prediction of high-frequency textures. In this work, we propose UltraSR, a simple yet effective new network design based on implicit image functions in which we deeply integrated spatial coordinates and periodic encoding with the implicit neural representation. Through extensive experiments and ablation studies, we show that spatial encoding is a missing key toward the next-stage high-performing implicit image function. Our UltraSR sets new state-of-the-art performance on the DIV2K benchmark under all super-resolution scales compared to previous state-of-the-art methods. UltraSR also achieves superior performance on other standard benchmark datasets in which it outperforms prior works in almost all experiments.
研究动机与目标
- Motivate and analyze the role of spatial encoding in implicit-function-based 2D image representation for SR.
- Propose UltraSR with periodic spatial encoding and deep coordinate fusion to improve high-frequency detail recovery.
- Show that spatial encoding plus residual and coordination fusion surpass LIIF on multiple SR scales and datasets.
- Demonstrate through ablations that spatial encoding and network design choices are critical for SR fidelity.
提出的方法
- Introduce periodic spatial encoding phi(delta x) using 48D sine/cosine features on coordinates.
- Adopt deep coordinate fusion by concatenating 2D coordinates with spatial encoding into all hidden layers of a residual MLP (ResMLP).
- Use residual links (ResMLP) to better propagate high-frequency details and suppress low-frequency leakage.
- Formulate implicit image function s = f_theta(v_r, delta x, phi(delta x)) with v_r from the LR feature map and delta x as normalized coordinate difference.
- Employ an encoder (EDSR or RDN) without upsampling layers, trained end-to-end to render HR pixels from LR regions.
- Train with bicubic-downsampled LR inputs, voxel-wise rendering targets, and L1 loss, using Adam optimizer and staged learning-rate decay.
实验结果
研究问题
- RQ1Does spatial encoding significantly improve the quality of implicit-function-based arbitrary-scale SR over prior methods like LIIF?
- RQ2How do architectural choices (coordinate fusion and residual MLP) interact with spatial encoding to affect high-frequency detail reconstruction?
- RQ3What is the quantitative impact of spatial encoding dimension and periodic basis on SR performance across scales and datasets?
- RQ4Can UltraSR achieve state-of-the-art PSNR on DIV2K and standard SR benchmarks across multiple scales?
主要发现
- UltraSR consistently surpasses LIIF and MetaSR across DIV2K scales for both EDSR and RDN encoders.
- Spatial encoding with coordinate fusion yields notable PSNR gains (up to around 0.05 dB in some scales) and reduces structural distortions at extreme scales.
- ResMLP with residual connections provides better high-frequency detail recovery than vanilla MLP in this implicit-function SR setting.
- Ablations show that spatial encoding alone is insufficient; combining it with coordinate fusion and residual links yields the best performance.
- Across five standard datasets (Set5, Set14, B100, Urban100, Manga109), UltraSR-RDN and UltraSR-EDSR outperform LIIF and RDN in most reported entries, especially on larger datasets and scales.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。