QUICK REVIEW

[論文レビュー] Rate-Perception Optimized Preprocessing for Video Coding

Chengqian Ma, Zhiqiang Wu|arXiv (Cornell University)|Jan 25, 2023

Video Coding and Compression Technologies被引用数 9

ひとこと要約

The paper proposes a rate-perception optimized preprocessing (RPP) that pre-processes frames with a lightweight network and adaptive DCT loss to reduce bitrate while preserving perceptual quality, achieving significant BD-rate savings across AVC, HEVC, VVC, and AV1 without changing encoder/decoder settings.

ABSTRACT

In the past decades, lots of progress have been done in the video compression field including traditional video codec and learning-based video codec. However, few studies focus on using preprocessing techniques to improve the rate-distortion performance. In this paper, we propose a rate-perception optimized preprocessing (RPP) method. We first introduce an adaptive Discrete Cosine Transform loss function which can save the bitrate and keep essential high frequency components as well. Furthermore, we also combine several state-of-the-art techniques from low-level vision fields into our approach, such as the high-order degradation model, efficient lightweight network design, and Image Quality Assessment model. By jointly using these powerful techniques, our RPP approach can achieve on average, 16.27% bitrate saving with different video encoders like AVC, HEVC, and VVC under multiple quality metrics. In the deployment stage, our RPP method is very simple and efficient which is not required any changes in the setting of video encoding, streaming, and decoding. Each input frame only needs to make a single pass through RPP before sending into video encoders. In addition, in our subjective visual quality test, 87% of users think videos with RPP are better or equal to videos by only using the codec to compress, while these videos with RPP save about 12% bitrate on average. Our RPP framework has been integrated into the production environment of our video transcoding services which serve millions of users every day.

研究の動機と目的

Motivate preprocessing as a means to improve rate-distortion performance in both traditional and learned video codecs.
Introduce an adaptive DCT loss to preserve high-frequency details while reducing spatial redundancy.
Design a lightweight CNN with attention for efficient preprocessing and integrate full-reference IQA for perceptual quality.
Demonstrate plug-and-play Deployment with standard codecs (AVC, HEVC, VVC, AV1) without encoder changes.
Quantify bitrate savings (BD-rate) and subjective quality gains on multiple datasets and codecs.

提案手法

Develop adaptive DCT loss to selectively keep high-frequency components based on DCT coefficient magnitude and a threshold derived from the coefficients.
Incorporate a rate-perception optimized preprocessor (RPP) as a light-weight fully convolutional network with channel attention and efficient up/downsampling.
Model image degradation with higher-order degradation to simulate real-world artifacts during training.
Train with a joint loss combining adaptive DCT loss, MS-SSIM perceptual loss, and L1 reconstruction loss, with tunable weights.
Deploy as a single-pass preprocessor; the preprocessed frame f_p is encoded by standard codecs without changing encoder/decoder settings.

Figure 2 : Example framework of training RPP. (a) is the histogram of frequency coefficient of the predicted frame. (b) is the histogram of frequency coefficient filtered by the adaptive DCT function

実験結果

リサーチクエスチョン

RQ1Can a preprocessing stage improve bitrate without modifying existing codecs?
RQ2Does an adaptive DCT-based loss better preserve perceptually important high-frequency content while enabling bitrate savings?
RQ3How does joint optimization with MS-SSIM and degradation modeling affect RD performance across multiple codecs?
RQ4What is the practical inference efficiency of RPP on common hardware?
RQ5Is the approach robust across datasets and presets (very fast/medium) for H.264/HEVC/VVC/AV1?

主な発見

Dataset	Codec	Metric	BD-Rate
UVG	RPP+H.264(veryfast)	VMAF	-26.92
UVG	RPP+H.264(veryfast)	MS-SSIM	-4.86
UVG	RPP+H.265(veryfast)	VMAF	-39.77
UVG	RPP+H.265(veryfast)	MS-SSIM	-8.70
UVG	RPP+H.264(medium)	VMAF	-27.30
UVG	RPP+H.264(medium)	MS-SSIM	-5.60
UVG	RPP+H.265(medium)	VMAF	-39.24
UVG	RPP+H.265(medium)	MS-SSIM	-9.58
MCL-JCV	RPP+H.264	VMAF	-11.84
MCL-JCV	RPP+H.264	MS-SSIM	-11.75
MCL-JCV	RPP+H.265	VMAF	-14.94
MCL-JCV	RPP+H.265	MS-SSIM	-19.90
HEVC ClassB	RPP+H.264	VMAF	-11.84
HEVC ClassB	RPP+H.264	MS-SSIM	-11.75
HEVC ClassB	RPP+H.265	VMAF	-14.94
HEVC ClassB	RPP+H.265	MS-SSIM	-19.90

RPP yields average BD-rate savings around 16.27% across AVC, HEVC, and VVC under multiple metrics.
Adaptive DCT loss contributes substantial bitrate savings, accounting for over 60% of the overall BD-rate improvement in the ablation study.
RPP+H.265 consistently provides larger BD-rate reductions than H.264 across datasets and presets.
Subjective tests show 87% of viewers consider RPP-augmented videos better or equal to codec-only videos, with about 12% bitrate savings on average.
RPP supports real-time-like inference speeds (e.g., 87.7 FPS for 1080p on RTX 3090 in TensorRT), enabling practical deployment.
RPP is plug-and-play, requiring only a single forward pass per frame before encoding, and does not require changes to encoder/decoder configurations.

Figure 3 : (a) Rate distortion curves for UVG dataset, MCL_JCV dataset, and HEVC Class B dataset on MS-SSIM and VMAF. Curves are plotted for the standard codec and RPP + standard codec. The corrrsponding BD rates for our proposed method are reported in Tables 1, 2 and 3, repsectively, for each datas

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。