Skip to main content
QUICK REVIEW

[论文解读] Rate-Perception Optimized Preprocessing for Video Coding

Chengqian Ma, Zhiqiang Wu|arXiv (Cornell University)|Jan 25, 2023
Video Coding and Compression Technologies被引用 9
一句话总结

本文提出一种按比特率感知优化的预处理(RPP),使用轻量级网络和自适应DCT损失在不改变编码器/解码器设置的前提下,预处理帧以降低比特率并保持感知质量,在AVC、HEVC、VVC和AV1中实现显著的BD-rate节省。

ABSTRACT

In the past decades, lots of progress have been done in the video compression field including traditional video codec and learning-based video codec. However, few studies focus on using preprocessing techniques to improve the rate-distortion performance. In this paper, we propose a rate-perception optimized preprocessing (RPP) method. We first introduce an adaptive Discrete Cosine Transform loss function which can save the bitrate and keep essential high frequency components as well. Furthermore, we also combine several state-of-the-art techniques from low-level vision fields into our approach, such as the high-order degradation model, efficient lightweight network design, and Image Quality Assessment model. By jointly using these powerful techniques, our RPP approach can achieve on average, 16.27% bitrate saving with different video encoders like AVC, HEVC, and VVC under multiple quality metrics. In the deployment stage, our RPP method is very simple and efficient which is not required any changes in the setting of video encoding, streaming, and decoding. Each input frame only needs to make a single pass through RPP before sending into video encoders. In addition, in our subjective visual quality test, 87% of users think videos with RPP are better or equal to videos by only using the codec to compress, while these videos with RPP save about 12% bitrate on average. Our RPP framework has been integrated into the production environment of our video transcoding services which serve millions of users every day.

研究动机与目标

  • 将预处理作为提高传统与学习型视频编码器的码率-失真性能的手段进行动机阐述。
  • 引入自适应DCT损失,以在降低空间冗余的同时保留高频细节。
  • 设计带注意力机制的轻量级CNN用于高效预处理,并整合全参考IQA以提升感知质量。
  • 展示即插即用部署,与标准编码器(AVC、HEVC、VVC、AV1)兼容且无需改变编码器/解码器设置。
  • 量化在多数据集和编解码器上的比特率节省(BD-rate)及主观质量提升。

提出的方法

  • 开发自适应DCT损失,基于DCT系数幅值和由系数推导的阈值来有选择地保留高频分量。
  • 将按比特率感知优化的预处理器(RPP)作为一个轻量级的全卷积网络,具通道注意力和高效的上采样/下采样。
  • 在训练中使用高阶退化来模拟现实世界的伪影,以建模图像降级。
  • 使用联合损失进行训练,包括自适应DCT损失、MS-SSIM感知损失和L1重建损失,权重可调。
  • 作为单通道前处理器部署;预处理后的帧f_p由标准编解码器进行编码,且不改变编码器/解码器设置。
Figure 2 : Example framework of training RPP. (a) is the histogram of frequency coefficient of the predicted frame. (b) is the histogram of frequency coefficient filtered by the adaptive DCT function
Figure 2 : Example framework of training RPP. (a) is the histogram of frequency coefficient of the predicted frame. (b) is the histogram of frequency coefficient filtered by the adaptive DCT function

实验结果

研究问题

  • RQ1在不修改现有编解码器的情况下,预处理阶段是否可以改善比特率?
  • RQ2基于自适应DCT的损失是否能够在实现比特率节省的同时更好地保留对感知重要的高频内容?
  • RQ3与MS-SSIM的联合优化和降级建模如何影响多编解码器的 RD 性能?
  • RQ4RPP在常见硬件上的实际推理效率如何?
  • RQ5该方法在数据集和预设(very fast/medium)下对H.264/HEVC/VVC/AV1的鲁棒性如何?

主要发现

DatasetCodecMetricBD-Rate
UVGRPP+H.264(veryfast)VMAF-26.92
UVGRPP+H.264(veryfast)MS-SSIM-4.86
UVGRPP+H.265(veryfast)VMAF-39.77
UVGRPP+H.265(veryfast)MS-SSIM-8.70
UVGRPP+H.264(medium)VMAF-27.30
UVGRPP+H.264(medium)MS-SSIM-5.60
UVGRPP+H.265(medium)VMAF-39.24
UVGRPP+H.265(medium)MS-SSIM-9.58
MCL-JCVRPP+H.264VMAF-11.84
MCL-JCVRPP+H.264MS-SSIM-11.75
MCL-JCVRPP+H.265VMAF-14.94
MCL-JCVRPP+H.265MS-SSIM-19.90
HEVC ClassBRPP+H.264VMAF-11.84
HEVC ClassBRPP+H.264MS-SSIM-11.75
HEVC ClassBRPP+H.265VMAF-14.94
HEVC ClassBRPP+H.265MS-SSIM-19.90
  • RPP在多种度量下对AVC、HEVC和VVC的平均BD-rate节省约为16.27%。
  • 自适应DCT损失贡献显著的比特率节省,在消融研究中占总BD-rate改进的超过60%。
  • RPP+H.265在数据集和预设下始终比H.264提供更大的BD-rate降低。
  • 主观测试显示87%的观众认为RPP增强的视频优于或等同于仅编码器的视频,平均比特率节省约为12%。
  • RPP具有接近实时的推理速度(如在RTX 3090的TensorRT上1080p为87.7 FPS),实现实际部署成为可能。
  • RPP是即插即用的,只需在编码前完成一次前向传播,无需修改编码器/解码器配置。
Figure 3 : (a) Rate distortion curves for UVG dataset, MCL_JCV dataset, and HEVC Class B dataset on MS-SSIM and VMAF. Curves are plotted for the standard codec and RPP + standard codec. The corrrsponding BD rates for our proposed method are reported in Tables 1, 2 and 3, repsectively, for each datas
Figure 3 : (a) Rate distortion curves for UVG dataset, MCL_JCV dataset, and HEVC Class B dataset on MS-SSIM and VMAF. Curves are plotted for the standard codec and RPP + standard codec. The corrrsponding BD rates for our proposed method are reported in Tables 1, 2 and 3, repsectively, for each datas

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。