[Paper Review] Learning Convolutional Networks for Content-weighted Image Compression
This paper proposes a content-weighted image compression framework using a learnable importance map to enable spatially variant bit allocation in CNN-based compression. By replacing discrete entropy estimation with a continuous importance map sum for rate control and using a differentiable proxy for binarization, the method enables end-to-end training without explicit entropy coding during optimization, achieving superior SSIM and visual quality over JPEG and JPEG 2000 at low bit rates.
Lossy image compression is generally formulated as a joint rate-distortion optimization to learn encoder, quantizer, and decoder. However, the quantizer is non-differentiable, and discrete entropy estimation usually is required for rate control. These make it very challenging to develop a convolutional network (CNN)-based image compression system. In this paper, motivated by that the local information content is spatially variant in an image, we suggest that the bit rate of the different parts of the image should be adapted to local content. And the content aware bit rate is allocated under the guidance of a content-weighted importance map. Thus, the sum of the importance map can serve as a continuous alternative of discrete entropy estimation to control compression rate. And binarizer is adopted to quantize the output of encoder due to the binarization scheme is also directly defined by the importance map. Furthermore, a proxy function is introduced for binary operation in backward propagation to make it differentiable. Therefore, the encoder, decoder, binarizer and importance map can be jointly optimized in an end-to-end manner by using a subset of the ImageNet database. In low bit rate image compression, experiments show that our system significantly outperforms JPEG and JPEG 2000 by structural similarity (SSIM) index, and can produce the much better visual result with sharp edges, rich textures, and fewer artifacts.
Motivation & Objective
- To address the challenge of non-differentiable quantization and discrete entropy estimation in end-to-end CNN-based image compression.
- To enable spatially variant bit allocation by learning an importance map that reflects local image content complexity.
- To replace traditional entropy rate estimation with a continuous proxy based on the sum of the importance map for rate control.
- To develop a differentiable binarization scheme using a proxy function to allow backpropagation through the quantization step.
- To improve visual quality in low-bitrate compression by preserving edges and textures through content-aware bit allocation.
Proposed method
- A convolutional encoder produces feature maps from the input image, which are then processed by a separate importance map network to generate a spatially varying importance map.
- The importance map determines how many feature maps are encoded at each spatial location, enabling content-adaptive bit allocation.
- A binarizer sets values above 0.5 to 1 and others to 0, with a proxy function used in backpropagation to make the operation differentiable.
- The sum of the importance map serves as a continuous, differentiable approximation of the total bit rate, replacing discrete entropy estimation in the loss function.
- A convolutional entropy coder is applied post-quantization to further compress the binary codes and importance map using context modeling.
- The entire system is trained end-to-end on a subset of ImageNet, with no explicit entropy rate term in the loss, relying solely on the importance map for rate control.
Experimental results
Research questions
- RQ1Can a learnable importance map effectively replace discrete entropy estimation in CNN-based image compression?
- RQ2How does spatially variant bit allocation guided by a content-aware importance map affect rate-distortion performance and visual quality?
- RQ3Can a differentiable proxy function enable end-to-end training of a binarized compression system with non-differentiable quantization?
- RQ4To what extent does the absence of explicit entropy coding during training affect compression efficiency when using a separate entropy coder?
- RQ5How does the model’s learned importance map align with human visual perception in terms of bit allocation to edges and textures?
Key findings
- The proposed method achieves significantly better structural similarity (SSIM) than JPEG and JPEG 2000 at low bit rates, with measurable improvements in visual quality.
- The model produces sharper edges, richer textures, and fewer artifacts such as blurring, ringing, and blocking compared to JPEG 2000 and Ballé [1].
- The baseline model without the importance map performs worse than JPEG 2000 in terms of MSE, PSNR, and SSIM, demonstrating the necessity of the importance map.
- The importance map learns to allocate more bits to salient edges at low bit rates, progressively covering mid-scale and small-scale textures as rate increases, aligning with human perception.
- The convolutional entropy encoder outperforms standard CABAC with small context, especially when using larger context, and further improves rate-distortion performance.
- Even with only binary codes or importance map encoded, the full model with both components achieves the best performance, confirming the complementary role of both elements.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.