QUICK REVIEW

[论文解读] Joint Autoregressive and Hierarchical Priors for Learned Image Compression

David Minnen, Johannes Ballé|arXiv (Cornell University)|Sep 7, 2018

Generative Adversarial Networks and Image Synthesis参考文献 19被引用 593

一句话总结

本文通过引入联合自回归上下文模型与超先验扩展了学习型图像压缩，在PSNR和MS-SSIM上实现了最优的率失真性能，超越BPG及其他编解码器。

ABSTRACT

Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a way to exploit more structure in the latents than simple fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, as well as combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate--distortion performance, providing a 15.8% average reduction in file size over the previous state-of-the-art method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. To the best of our knowledge, our model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.

研究动机与目标

用高斯混合先验扩展基于 GSM 的熵模型。
引入自回归上下文模型以进一步降低熵。
将自回归上下文与超先验结合以最大化率失真性能。
评估不同变体并量化上下文尺寸、分布和复杂度之间的权衡。
在标准基准（Kodak）上展示最先进的结果，并与BPG及其他编解码器进行比较。

提出的方法

将熵模型从尺度超先验推广为在超先验条件下的高斯混合模型。
在潜变量上加入自回归上下文模型，以预测每个潜变量的均值和尺度。
将上下文模型与超先验结合，形成带有条件高斯的联合熵模型；端到端以率失真目标进行训练。
在训练时将潜变量建模为高斯卷积单位均匀分布以实现可微分。
使用两部分潜变量管线：潜变量 y 和超潜变量 z，每个在损失函数中具有各自的熵成本。
探索了架构变体（仅上下文、仅超先验、以及联合），并分析上下文尺寸与分布选择。

实验结果

研究问题

RQ1在学习型图像编解码器中，当与分层先验结合时，自回归先验是否能提升压缩性能？
RQ2使用高斯混合熵模型相比尺度超先验对率失真性能的影响是什么？
RQ3将上下文（自回归）与超先验结合如何影响比特流大小和重建质量？
RQ4在学习型图像压缩中，模型复杂度、上下文尺寸与压缩增益之间有哪些实际权衡？

主要发现

联合自回归与超先验的模型实现了最先进的率失真性能。
该联合模型相比前一最佳学习方法，平均减少文件大小15.8%。
该 reductions 相较于 JPEG，大小减少 59.8%；相较于 WebP 和 JPEG2000，减少超过 35%。
比特流比 BPG 小 8.4%，当时的最先进编解码器。
据作者所知，该模型是首个在 PSNR 和 MS-SSIM 两者上都超越 BPG 的学习型方法。
高斯混合熵模型相较于更简单的 GSM，在没有增加渐近复杂度的情况下带来收益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。