[论文解读] Joint Autoregressive and Hierarchical Priors for Learned Image Compression
本文通过结合自回归上下文模型与层次化超先验来扩展学习型图像压缩,在速率-失真性能上达到最先进水平,并在 PSNR 和 MS-SSIM 指标上超越 BPG。
Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a way to exploit more structure in the latents than simple fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, as well as combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate--distortion performance, providing a 15.8% average reduction in file size over the previous state-of-the-art method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. To the best of our knowledge, our model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.
研究动机与目标
- 将基于 GSM 的熵模型用于学习型图像压缩扩展为带条件高斯混合和自回归上下文。
- 研究自回归先验与层次化超先验在熵建模中的互补优势。
- 在 PSNR 和 MS-SSIM 上评估相对于标准编解码器和先前学习方法的速率-失真性能。
提出的方法
- 通过在超先验和因果上下文条件下同时预测均值和尺度,将尺度超先验模型推广为高斯混合模型。
- 将自回归上下文模型与超先验结合,形成一个输出每个潜变量的均值和尺度的联合熵模型。
- 使用两部分神经网络架构:一个基础自编码器用于潜在表示,及一个概率模型(上下文 + 超先验)用于熵编码。
- 以速率-失真目标进行训练,其中包括潜变量和超潜变量的成本,以及平方误差失真。
- 应用基于 5x5 遮罩卷积的上下文模型以捕捉潜变量中的因果依赖;保留 Entropy Parameters 网络以预测高斯参数。
- 在 Kodak 上评估 RD 性能(PSNR 和 MS-SSIM),并与标准编解码器(BPG、JPEG、JPEG2000、WebP)及先前学习方法进行比较。
实验结果
研究问题
- RQ1当与超先验结合时,自回归潜在先验是否能提升压缩性能?
- RQ2就速率-失真而言,高斯混合熵模型与尺度超先验的比较如何?
- RQ3将自回归上下文与层次化先验整合到学习型图像压缩中会带来哪些提升?
主要发现
- 联合上下文(自回归)和超先验的模型实现了最先进的速率-失真性能。
- 该联合模型使得平均文件大小比先前最先进的学习方法减少 15.8%。
- 这一改进相当于比 JPEG 减少约 59.8% 的尺寸。
- 该方法相对于 WebP 和 JPEG2000 的尺寸减少超过 35%。
- 比 BPG 小 8.4%,当时的最先进编解码器。
- 据作者所知,这是首个在 PSNR 和 MS-SSIM 两者上均超越 BPG 的学习型方法。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。