[论文解读] Efficient Long-Range Attention Network for Image Super-resolution
ELAN 引入基于 shift-convolution 的局部特征提取以及具 共有注意力 的分组多尺度自注意力,以高效建模图像超分辨率中的远程依赖,在复杂度低于 transformer-based SR 模型的情况下达到最先进的结果。
Recently, transformer-based methods have demonstrated impressive results in various vision tasks, including image super-resolution (SR), by exploiting the self-attention (SA) for feature extraction. However, the computation of SA in most existing transformer based models is very expensive, while some employed operations may be redundant for the SR task. This limits the range of SA computation and consequently the SR performance. In this work, we propose an efficient long-range attention network (ELAN) for image SR. Specifically, we first employ shift convolution (shift-conv) to effectively extract the image local structural information while maintaining the same level of complexity as 1x1 convolution, then propose a group-wise multi-scale self-attention (GMSA) module, which calculates SA on non-overlapped groups of features using different window sizes to exploit the long-range image dependency. A highly efficient long-range attention block (ELAB) is then built by simply cascading two shift-conv with a GMSA module, which is further accelerated by using a shared attention mechanism. Without bells and whistles, our ELAN follows a fairly simple design by sequentially cascading the ELABs. Extensive experiments demonstrate that ELAN obtains even better results against the transformer-based SR models but with significantly less complexity. The source code can be found at https://github.com/xindongzhang/ELAN.
研究动机与目标
- Motivate efficient long-range modeling for SR to reduce the high computational burden of standard self-attention.
- Propose a simple, effective architecture that stacks local feature extraction with long-range attention blocks.
- Develop mechanisms to accelerate attention computation and reduce memory usage without sacrificing SR quality.
提出的方法
- Use shift-convolution to enlarge local receptive fields with complexity similar to 1x1 conv.
- Introduce group-wise multi-scale self-attention (GMSA) that computes self-attention on multiple non-overlapping feature groups with different window sizes.
- Incorporate an accelerated self-attention (ASA) by removing layer normalization and using a symmetric embedded Gaussian space to reduce operations.
- Apply a shared attention mechanism to reuse attention scores across adjacent SA modules for efficiency.
- Assemble ELAB blocks by cascading two shift-conv layers and a GMSA module, forming the ELAN architecture with a shallow feature extractor and HR reconstruction.
实验结果
研究问题
- RQ1Can long-range self-attention be modeled efficiently for SR without the heavy cost of standard transformers?
- RQ2Does combining shift-convolution based local feature extraction with a multi-scale, group-wise attention improve SR performance at lower computational cost?
- RQ3Can attention scores be shared across layers to accelerate inference without significant loss in quality?
主要发现
- ELAN achieves competitive to state-of-the-art SR performance with significantly lower latency and fewer parameters than comparable transformer-based methods.
- GMSA with multiple window sizes provides larger receptive field for long-range dependencies while controlling computational cost.
- Shared attention reduces inference-time computations with minimal PSNR/SSIM loss.
- A shift-convolution based local feature extractor expands receptive fields efficiently without added complexity.
- Ablation studies show the combined ELAB design with shift-conv, ASA, GMSA, and shared attention yields substantial speedups (e.g., ~4.5x) over SwinIR-light with comparable quality.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。