Skip to main content
QUICK REVIEW

[Paper Review] UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution

Xingqian Xu, Zhangyang Wang|arXiv (Cornell University)|Mar 23, 2021
Advanced Image Processing Techniques52 references39 citations
TL;DR

UltraSR augment implicit image functions with periodic spatial encoding and deep coordinate fusion, yielding state-of-the-art arbitrary-scale SR across DIV2K and other benchmarks by reducing structural distortions and enhancing high-frequency details.

ABSTRACT

The recent success of NeRF and other related implicit neural representation methods has opened a new path for continuous image representation, where pixel values no longer need to be looked up from stored discrete 2D arrays but can be inferred from neural network models on a continuous spatial domain. Although the recent work LIIF has demonstrated that such novel approaches can achieve good performance on the arbitrary-scale super-resolution task, their upscaled images frequently show structural distortion due to the inaccurate prediction of high-frequency textures. In this work, we propose UltraSR, a simple yet effective new network design based on implicit image functions in which we deeply integrated spatial coordinates and periodic encoding with the implicit neural representation. Through extensive experiments and ablation studies, we show that spatial encoding is a missing key toward the next-stage high-performing implicit image function. Our UltraSR sets new state-of-the-art performance on the DIV2K benchmark under all super-resolution scales compared to previous state-of-the-art methods. UltraSR also achieves superior performance on other standard benchmark datasets in which it outperforms prior works in almost all experiments.

Motivation & Objective

  • Motivate and analyze the role of spatial encoding in implicit-function-based 2D image representation for SR.
  • Propose UltraSR with periodic spatial encoding and deep coordinate fusion to improve high-frequency detail recovery.
  • Show that spatial encoding plus residual and coordination fusion surpass LIIF on multiple SR scales and datasets.
  • Demonstrate through ablations that spatial encoding and network design choices are critical for SR fidelity.

Proposed method

  • Introduce periodic spatial encoding phi(delta x) using 48D sine/cosine features on coordinates.
  • Adopt deep coordinate fusion by concatenating 2D coordinates with spatial encoding into all hidden layers of a residual MLP (ResMLP).
  • Use residual links (ResMLP) to better propagate high-frequency details and suppress low-frequency leakage.
  • Formulate implicit image function s = f_theta(v_r, delta x, phi(delta x)) with v_r from the LR feature map and delta x as normalized coordinate difference.
  • Employ an encoder (EDSR or RDN) without upsampling layers, trained end-to-end to render HR pixels from LR regions.
  • Train with bicubic-downsampled LR inputs, voxel-wise rendering targets, and L1 loss, using Adam optimizer and staged learning-rate decay.

Experimental results

Research questions

  • RQ1Does spatial encoding significantly improve the quality of implicit-function-based arbitrary-scale SR over prior methods like LIIF?
  • RQ2How do architectural choices (coordinate fusion and residual MLP) interact with spatial encoding to affect high-frequency detail reconstruction?
  • RQ3What is the quantitative impact of spatial encoding dimension and periodic basis on SR performance across scales and datasets?
  • RQ4Can UltraSR achieve state-of-the-art PSNR on DIV2K and standard SR benchmarks across multiple scales?

Key findings

  • UltraSR consistently surpasses LIIF and MetaSR across DIV2K scales for both EDSR and RDN encoders.
  • Spatial encoding with coordinate fusion yields notable PSNR gains (up to around 0.05 dB in some scales) and reduces structural distortions at extreme scales.
  • ResMLP with residual connections provides better high-frequency detail recovery than vanilla MLP in this implicit-function SR setting.
  • Ablations show that spatial encoding alone is insufficient; combining it with coordinate fusion and residual links yields the best performance.
  • Across five standard datasets (Set5, Set14, B100, Urban100, Manga109), UltraSR-RDN and UltraSR-EDSR outperform LIIF and RDN in most reported entries, especially on larger datasets and scales.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.