QUICK REVIEW

[论文解读] End-to-end Optimized Image Compression

Johannes Ballé, Valero Laparra|arXiv (Cornell University)|Nov 5, 2016

Image and Signal Denoising Methods参考文献 31被引用 1,010

一句话总结

本文训练了一个带有广义除法归一化（GDN）非线性和均匀量化的非线性变换编码模型，端到端优化以实现速率–失真，达到更高的感知质量并在速率–失真方面相对于 JPEG 与 JPEG 2000 具有竞争力。

ABSTRACT

We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural networks, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate-distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer. Under certain conditions, the relaxed loss function may be interpreted as the log likelihood of a generative model, as implemented by a variational autoencoder. Unlike these models, however, the compression model must operate at any given point along the rate-distortion curve, as specified by a trade-off parameter. Across an independent set of test images, we find that the optimized method generally exhibits better rate-distortion performance than the standard JPEG and JPEG 2000 compression methods. More importantly, we observe a dramatic improvement in visual quality for all images at all bit rates, which is supported by objective quality estimates using MS-SSIM.

研究动机与目标

以可学习的非线性变换编码框架为动机并解决有损图像压缩问题。
在速率–失真目标下引入分析/合成变换的端到端优化。
利用广义除法归一化（GDN）非线性增益控制来高斯化局部图像统计。
采用量化的可微放松以实现随机梯度下降优化。
证明相较于 JPEG 和 JPEG 2000，获得改进的速率–失真性能和显著的感知质量提升。

提出的方法

使用三阶段卷积滤波级联并加上广义除法归一化（GDN）来形成分析变换。
在代码空间内进行均匀标量量化后，应用一个对应的三阶段合成变换及近似逆（IGDN）。
用加性均匀噪声放松量化以实现基于梯度的优化，同时以量化码的熵为基础的速率项为目标。
联合优化分析/合成变换和熵模型以最小化损失 L = E[ -log2 p(Ã y) + λ d(z, ẑ) ]，通过连续放松近似速率和失真。
用训练过程中更新的非参数分段线性密度来建模代码空间边缘分布 p(Ã y)。
将该框架与变分自编码器相关联，强调相似性与关键差异（离散压缩、端到端速率–失真包络）。

实验结果

研究问题

RQ1端到端优化非线性变换码是否能改善自然图像的速率–失真性能？
RQ2生物学启发的非线性增益控制（GDN/IGDN）是否更能高斯化图像统计并提升编码效率？
RQ3在不同 λ 权衡下优化速率–失真目标对感知质量与传统指标的影响如何？
RQ4在不同比特率下，所提方法在客观指标（MS-SSIM、PSNR）和视觉质量上与 JPEG 和 JPEG 2000 的比较如何？

主要发现

所提出的方法在测试图像上经常获得比 JPEG 和 JPEG 2000 更好的速率–失真性能。
在可比比特率下，该方法实现明显更高的 MS-SSIM，表明在各图像和比特率上的感知质量优越。
在一个具有代表性的示例中，JPEG: 0.121 bit/px PSNR Luma 24.85 dB MS-SSIM 0.8079；JPEG 2000: 0.113 bit/px PSNR Luma 26.61 dB MS-SSIM 0.8860；所提方法: 0.113 bit/px PSNR Luma 27.01 dB MS-SSIM 0.9039。
在视觉质量方面的优势包括相较于线性变换编码器减少块效应和振铃伪影，轮廓更平滑，边缘保持良好，覆盖不同比特率。
方法在所有测试图像和比特率上体现出感知改进，MS-SSIM 为基础的评估所表明。
训练表明连续放松很好地近似离散的速率–失真目标，从而实现有效的端到端优化。
尽管使用 MSE 进行训练，该方法仍产生视觉上更优的结果，表明如果以感知指标进行训练可能带来潜在收益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。