QUICK REVIEW

[论文解读] Deep Contextual Video Compression

Jiahao Li, Bin Li|arXiv (Cornell University)|Sep 30, 2021

Advanced Image Processing Techniques参考文献 37被引用 115

一句话总结

本文提出 DCVC，一种条件编码框架，在视频压缩情境中通过 MEMC 学习的高维特征域上下文进行条件编码，在相较于先前的 DL 基方法和 x265 的比特率节省方面取得了显著效果。它联合优化编码、解码和熵建模，使用可学习的上下文。

ABSTRACT

Most of the existing neural video compression methods adopt the predictive coding framework, which first generates the predicted frame and then encodes its residue with the current frame. However, as for compression ratio, predictive coding is only a sub-optimal solution as it uses simple subtraction operation to remove the redundancy across frames. In this paper, we propose a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding. In particular, we try to answer the following questions: how to define, use, and learn condition under a deep video compression framework. To tap the potential of conditional coding, we propose using feature domain context as condition. This enables us to leverage the high dimension context to carry rich information to both the encoder and the decoder, which helps reconstruct the high-frequency contents for higher video quality. Our framework is also extensible, in which the condition can be flexibly designed. Experiments show that our method can significantly outperform the previous state-of-the-art (SOTA) deep video compression methods. When compared with x265 using veryslow preset, we can achieve 26.0% bitrate saving for 1080P standard test videos.

研究动机与目标

定义并学习一个用于超越残差编码的视频压缩的可学习上下文（条件）。
设计一个统一影响编码器、解码器和熵模型的条件框架。
利用特征域运动补偿上下文来改善重建，特别是高频内容。
证明带有时间先验的条件编码在速率失真性能上优于基于残差的方法。

提出的方法

提出一个基于条件编码的框架，其中当前帧被编码为以前一解码帧通过特征域 MEMC 得到的可学习上下文 bar{x}_t 为条件。
将上下文定义为高维特征域信息，而非像素域预测；使上下文能够喂入编码器、解码器和熵模型。
在熵模型中使用带有超先验和自回归组件的时间先验来估计潜在码分布和比特率。
通过用 MEMC 学习的运动向量对前一帧的特征提取表示进行扭曲来学习上下文，随后用一个精化网络产生 bar{x}_t。
以速率失真目标 L = Λ * D + R 进行训练，其中 D 为失真（均方误差或 MS-SSIM），R 为基于交叉熵的比特率。

实验结果

研究问题

RQ1可学习的高维上下文是否能在帧间编码中超越简单的残差减法？
RQ2如何在特征空间中整合 MEMC 以引导上下文提取以实现更好的压缩？
RQ3哪种熵模型架构（超先验、自回归、时间先验）在条件编码的速率失真性能上表现最好？
RQ4时间先验是否能够在不牺牲压缩增益的前提下实现更快、更多并行化的熵编码？
RQ5与现有最先进的 DL 基编解码器和传统编解码器在分辨率和内容类型上的比较如何？

主要发现

方法	MCL-JCV	UVG	HEVC Class B	HEVC Class C	HEVC Class D	HEVC Class E
DCVC (proposed)	-23.9%	-25.3%	-26.0%	-5.8%	-17.5%	-11.9%
DVCPro [4]	-4.1%	-7.9%	-9.0%	7.2%	-6.9%	17.2%
x265 (veryslow)	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
DVC [3]	13.3%	17.2%	7.9%	15.1%	7.2%	21.1%
x264 (veryslow)	32.7%	30.3%	35.0%	19.9%	15.5%	50.0%

DCVC 相较于先前的 DL 基编解码器和 x265 veryslow 实现了显著的比特率节省，例如在 1080p 标准测试视频上相比 x265 veryslow 节省 26.0%。
DCVC 在测试数据集和比特率上优于 DVCPro，在 MCL-JCV 和 UVG（1080p）上 PSNR 的 BD-Bitrate 改善高达 26.0%。
在高分辨率视频上观察到更大收益，因为特征域上下文携带了更丰富的高频信息。
基于时间先验的熵模型在有无空间先验的情况下均表现竞争力或更优，最佳结果在结合超先验、时间先验以及可选的空间先验时出现。
消融表明将上下文特征拼接在一起的增益大于 RGB 预测条件化，时间先验在条件编码中特别能提升结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。