QUICK REVIEW

[论文解读] EVC: Towards Real-Time Neural Image Compression with Mask Decay

Guohua Wang, Jiahao Li|arXiv (Cornell University)|Feb 10, 2023

Advanced Vision and Imaging被引用 24

一句话总结

EVC 提供一个单一、可扩展的神经图像压缩模型，能够在多分辨率下实时运行（最高 30 FPS），并通过 mask-decay 训练将一个大型教师模型转换为更小、效率更高的学生模型，以及用于可变 RD 权衡的可扩展编码器。

ABSTRACT

Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance, but suffers from large complexity and separate models for different rate-distortion trade-offs. In this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC), which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance. By further reducing both encoder and decoder complexities, our small model even achieves 30 FPS with 1920x1080 input images. To bridge the performance gap between our different capacities models, we meticulously design the mask decay, which transforms the large model's parameters into the small model automatically. And a novel sparsity regularization loss is proposed to mitigate shortcomings of $L_p$ regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at https://github.com/microsoft/DCVC.

研究动机与目标

在低延迟和跨 RD 权衡的单模型码率控制方面，激发实时神经图像压缩研究。
开发具备 GPU 友好模块、用于快速推断的高效编解码框架。
通过可调量化步长在同一模型内实现可变比特率处理。
引入 mask decay，将知识从大型教师模型转移到较小的学生模型。

提出的方法

提出一种高效的变比特率编解码器（EVC），使用 Depth-Conv 块和空间先验以提升 GPU 效率。
引入可调量化步长（全局和通道级），以在单一模型内实现多种 RD 权衡。
插入掩码层，将预训练的教师模型转换为更小的学生模型，并优化一种新颖的稀疏性损失以驱动 mask decay。
引入带梯度设计的稀疏正则化损失，以克服神经图像压缩中剪枝对 L1/L2 的局限性。
提出一种带残差表征学习（RRL）的可扩展编码器方法，以逐步缩小大型编码器与较小编码器之间的差距。
采取两阶段训练过程：先通过 mask decay 将教师转换为学生，然后对学生进行微调。

实验结果

研究问题

RQ1单一的神经图像压缩模型是否能够在多种 RD 权衡下实现实时的 RD 性能？
RQ2mask decay 能否在神经图像压缩中实现大型教师模型向较小、运行更快的学生模型的有效迁移？
RQ3具备残差表示的可扩展编码器是否能够在保持单一解码器的同时，缩小大型编码器和小型编码器之间的性能差距？
RQ4与标准的 L1/L2 损失相比，哪种稀疏正则化在训练神经图像压缩时能更好地提升剪枝效果？

主要发现

大型模型的性能优于 VTM、并且与 SOTA 神经编解码器相匹配；该方法使得同一模型能够覆盖不同的 RD 权衡。
大型模型在 768×512 输入下达到 30 FPS；小型模型在 1920×1080 输入下达到 30 FPS。
引入 mask decay 及新颖的稀疏性损失，相对于基线，显著提升 Medium 与 Small 模型，分别约 50% 和 30%。
具备残差表示学习的可扩展 EVC 在性能上优于 SlimCAE，并与其他 SOTA 模型持平，同时提供编码器可扩展性。
编码器比解码器更加冗余；移除/压缩编码器造成的 RD 性能损失小于移除解码器。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。