QUICK REVIEW

[論文レビュー] Flow-Guided Sparse Transformer for Video Deblurring

Jing Lin, Yuanhao Cai|arXiv (Cornell University)|Jan 6, 2022

Advanced Image Processing Techniques被引用数 22

ひとこと要約

FGSTはFlow-Guided sparse window-based transformerと再帰埋め込みを用いて動画のブラーを除去し、DVDおよび GOPRO データセットでSOTAを上回る。

ABSTRACT

Exploiting similar and sharper scene patches in spatio-temporal neighborhoods is critical for video deblurring. However, CNN-based methods show limitations in capturing long-range dependencies and modeling non-local self-similarity. In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. In FGST, we customize a self-attention module, Flow-Guided Sparse Window-based Multi-head Self-Attention (FGSW-MSA). For each $query$ element on the blurry reference frame, FGSW-MSA enjoys the guidance of the estimated optical flow to globally sample spatially sparse yet highly related $key$ elements corresponding to the same scene patch in neighboring frames. Besides, we present a Recurrent Embedding (RE) mechanism to transfer information from past frames and strengthen long-range temporal dependencies. Comprehensive experiments demonstrate that our proposed FGST outperforms state-of-the-art (SOTA) methods on both DVD and GOPRO datasets and even yields more visually pleasing results in real video deblurring. Code and pre-trained models are publicly available at https://github.com/linjing7/VR-Baseline

研究の動機と目的

Motivate video deblurring as leveraging long-range spatial dependencies and non-local self-similarity.
Overcome CNN/standard Transformer limitations by introducing flow-guided attention.
Capture long-range temporal dependencies via a recurrent embedding mechanism.
Preserve original image information while exploiting motion cues for robust deblurring.
Demonstrate state-of-the-art performance on DVD and GOPRO benchmarks.

提案手法

Propose Flow-Guided Sparse Transformer (FGST) with Flow-Guided Sparse Window-based Multi-head Self-Attention (FGSW-MSA).
Use optical flow to guide sampling of key elements across neighboring frames for each query, enabling globally sparse but highly relevant attention.
Introduce Flow-Guided Multi-head Self-Attention (FGS-MSA) and its window-based extension FGSW-MSA for robustness to flow inaccuracies.
Integrate a Recurrent Embedding (RE) mechanism to propagate information from past frames and model long-range temporal dependencies.
Adopt a U-Net-like encoder–bottleneck–decoder architecture with FGABs (FGST Attention Blocks) and skip connections.
Maintain computational efficiency by achieving near-linear complexity in the number of tokens via FGSW-MSA.

実験結果

リサーチクエスチョン

RQ1Can a flow-guided attention mechanism effectively capture non-local self-similarity for video deblurring?
RQ2Does sampling key elements guided by optical flow improve robustness to motion and reduce artifacts compared to traditional pre-warping?
RQ3Does the recurrent embedding mechanism enhance long-range temporal dependencies in a Transformer-based deblurring model?
RQ4How does FGST compare to state-of-the-art methods on standard benchmarks (DVD and GOPRO) in terms of quality and efficiency?
RQ5What are the impacts of window size, flow estimators, and attention variants on performance?

主な発見

FGSTはDVDおよび GOPROデータセットでSOTAを上回る。
On DVD, FGST surpasses the prior best ARVo by 0.56 dB in PSNR.
On GOPRO, FGST exceeds Suin et al. by 0.80 dB and TSP by 1.23 dB in PSNR.
Ablations show RE and FGSW-MSA jointly contribute large PSNR gains (up to about 1.72 dB when both are used).
FGST with FGSW-MSA achieves stronger attention to similar but misaligned patches than baselines, improving restoration of fast motion blur.
FGST demonstrates favorable efficiency, with substantial parameter and FLOPS reductions while achieving higher PSNR/SSIM than several CNN-based and Transformer baselines.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。