QUICK REVIEW

[论文解读] Rethinking Alignment in Video Super-Resolution Transformers

Shuwei Shi, Jinjin Gu|arXiv (Cornell University)|Jul 18, 2022

Advanced Image Processing Techniques被引用 35

一句话总结

论文展示 VSR Transformers 能够有效利用来自未对齐视频的多帧信息，认为对齐并不总是有益，并引入 Patch Alignment 以在高效计算下实现 state-of-the-art 结果。

ABSTRACT

The alignment of adjacent frames is considered an essential operation in video super-resolution (VSR). Advanced VSR models, including the latest VSR Transformers, are generally equipped with well-designed alignment modules. However, the progress of the self-attention mechanism may violate this common sense. In this paper, we rethink the role of alignment in VSR Transformers and make several counter-intuitive observations. Our experiments show that: (i) VSR Transformers can directly utilize multi-frame information from unaligned videos, and (ii) existing alignment methods are sometimes harmful to VSR Transformers. These observations indicate that we can further improve the performance of VSR Transformers simply by removing the alignment module and adopting a larger attention window. Nevertheless, such designs will dramatically increase the computational burden, and cannot deal with large motions. Therefore, we propose a new and efficient alignment method called patch alignment, which aligns image patches instead of pixels. VSR Transformers equipped with patch alignment could demonstrate state-of-the-art performance on multiple benchmarks. Our work provides valuable insights on how multi-frame information is used in VSR and how to select alignment methods for different networks/datasets. Codes and models will be released at https://github.com/XPixelGroup/RethinkVSRAlignment.

研究动机与目标

质疑 VSR Transformers 中显式对齐的必要性。
评估在 Transformer 窗口范围内的错位如何影响性能。
研究 flow 的质量和重采样如何影响 VSR Transformers 中的帧间信息利用。
提出 Patch Alignment 以在不产生高额计算成本的情况下高效处理更大运动。

提出的方法

使用带滑动窗口和多帧自注意力块（MFSAB）的 VSR Transformer 处理 2n+1 帧。
比较四种对齐类别：基于图像的 flow 对齐、特征对齐、基于可变形卷积的对齐，以及无对齐。
系统性地改变窗口大小以评估错位容忍度。
分析 flow 的特性及其训练动态，包括 flow 平滑效应。
引入 Patch Alignment：对图像 patch 进行先裁剪再移动，采用均值 patch 运动，使用最近邻重采样以在 patch 内保留亚像素信息。
在 REDS 和 Vimeo-90K 基准测试上使用 PSNR/SSIM 指标进行评估。

实验结果

研究问题

RQ1VSR Transformers 是否能够在不显式对齐的情况下利用来自未对齐帧的多帧信息？
RQ2在何时对齐对 VSR Transformers 有帮助或有害，以及窗口大小如何影响这种平衡？
RQ3流估计质量和重采样方法如何影响基于 VSR 的 VSR 中亚像素信息的保持？
RQ4基于 patch 的对齐方法是否能为 VSR Transformers 提供高效且有效的帧间一致性？

主要发现

对于 Transformer 窗口内的微小像素运动，VSR Transformers 可以在无对齐的情况下取得良好性能。
较大的窗口尺寸使得能够处理更大的错位，减少对对齐的需求，但会增加计算成本。
在训练中的流优化倾向于产生更平滑、更加稳定的流量，从而提升性能；在 Vimeo-90K 上，流的微调往往收敛到接近零，削弱了对齐的收益。
使用最近邻重采样的特征对齐方法在性能上可以达到可变形卷积方法的水平，但参数更少。
在 REDS 和 Vimeo-90K 上，Patch Alignment（图像空间或特征空间）配合 NN 重采样获得了最先进的结果，参数数量比若干竞争的 Transformer 基 VSR 方法更少。
Patch Alignment 保留了补丁内的亚像素信息，并减轻了不准确的 flow 和双线性重采样的负面影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。