QUICK REVIEW

[论文解读] Learnable Gated Temporal Shift Module for Deep Video Inpainting

Ya-Liang Chang, Zhe Yu Liu|arXiv (Cornell University)|Jul 2, 2019

Generative Adversarial Networks and Image Synthesis参考文献 30被引用 52

一句话总结

引入可学习门控时序切换模块（LGTSM），使2D CNN能够在自由形式视频修复中利用时序信息，达到参数量和推理时间约为3D-卷积基线的三分之一的前沿结果。

ABSTRACT

How to efficiently utilize temporal information to recover videos in a consistent way is the main issue for video inpainting problems. Conventional 2D CNNs have achieved good performance on image inpainting but often lead to temporally inconsistent results where frames will flicker when applied to videos (see https://www.youtube.com/watch?v=87Vh1HDBjD0&list=PLPoVtv-xp_dL5uckIzz1PKwNjg1yI0I94&index=1); 3D CNNs can capture temporal information but are computationally intensive and hard to train. In this paper, we present a novel component termed Learnable Gated Temporal Shift Module (LGTSM) for video inpainting models that could effectively tackle arbitrary video masks without additional parameters from 3D convolutions. LGTSM is designed to let 2D convolutions make use of neighboring frames more efficiently, which is crucial for video inpainting. Specifically, in each layer, LGTSM learns to shift some channels to its temporal neighbors so that 2D convolutions could be enhanced to handle temporal information. Meanwhile, a gated convolution is applied to the layer to identify the masked areas that are poisoning for conventional convolutions. On the FaceForensics and Free-form Video Inpainting (FVI) dataset, our model achieves state-of-the-art results with simply 33% of parameters and inference time.

研究动机与目标

推动在自由形式视频修复中高效利用时序信息。
开发一种在不使用3D卷积的情况下，用时序上下文增强2D卷积的模块。
引入门控机制以识别对卷积造成污染的遮挡区域。
在显著更少的参数和更快推理下实现最先进的结果。
提出一个损失框架（TSMGAN）以提升时序真实感。

提出的方法

用可学习时序移位核（LGTSM）扩展残差时序移位模块（TSM）。
在每一层中，使用可学习的核将部分特征通道移至相邻帧。
应用门控卷积以产生门控映射，区分有效、已修复和遮挡区域。
将门控移位与2D卷积结合，输出受门控映射调制的特征。
通过l1、感知、风格损失以及TSMGAN对抗损失的组合进行训练。
使用类U-Net的生成器和带谱归一化的TSMGAN判别器。

实验结果

研究问题

RQ1LGTSM 是否能够使2D CNNs 有效利用时序信息进行自由形式视频修复？
RQ2与固定的TSM和3D卷积相比，可学习的时序移位是否提升时序一致性和质量？
RQ3LGTSM 在具有挑战性的自由形式遮罩和多样化视频内容上的表现如何？
RQ4TSMGAN 损失对时序真实感和整体质量的影响是什么？

主要发现

带门控的 LGTSM 在 FaceForensics 与 Free-form Video Inpainting (FVI) 数据集上达到最先进或具竞争力的结果。
LGTSM 仅需要大约3D卷积基线参数量和推理时间的33%，同时在感知和视频质量（LPIPS、FID）方面与其相当。
消融实验表明门控卷积和TSMGAN损失均对性能有显著贡献；可学习的移位核在参数成本极低的情况下带来额外提升。
在用TSMGAN微调前对生成器进行预训练可加速训练并提高稳定性。
LGTSM 展示出强烈的定性表现，能够在不规则遮罩下生成时序连贯的修复视频。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。