QUICK REVIEW

[论文解读] Towards image compression with perfect realism at ultra-low bitrates

Marlène Careil, Matthew J. Muckley|arXiv (Cornell University)|Oct 16, 2023

Advanced Image Processing Techniques被引用 8

一句话总结

PerCo 使用以向量量化潜在表示和文本字幕为条件的扩散解码器，在 ultra-low bits-per-pixel 下实现感知真实的图像重构，在现实性指标上超越最先进编解码器，适用于极低像素比特率。

ABSTRACT

Image codecs are typically optimized to trade-off bitrate \vs distortion metrics. At low bitrates, this leads to compression artefacts which are easily perceptible, even when training with perceptual or adversarial losses. To improve image quality and remove dependency on the bitrate, we propose to decode with iterative diffusion models. We condition the decoding process on a vector-quantized image representation, as well as a global image description to provide additional context. We dub our model PerCo for 'perceptual compression', and compare it to state-of-the-art codecs at rates from 0.1 down to 0.003 bits per pixel. The latter rate is more than an order of magnitude smaller than those considered in most prior work, compressing a 512x768 Kodak image with less than 153 bytes. Despite this ultra-low bitrate, our approach maintains the ability to reconstruct realistic images. We find that our model leads to reconstructions with state-of-the-art visual quality as measured by FID and KID. As predicted by rate-distortion-perception theory, visual quality is less dependent on the bitrate than previous methods.

研究动机与目标

推动在极低比特率下保持真实感的图像压缩，超越传统的率失真权衡。
引入基于扩散模型的解码器，从压缩的潜在变量中重构真实感图像。
通过局部潜在表示和全局文本图像描述来增强条件化。
在 Kodak 与 MS-COCO 30k 上对比最先进编解码器，评估真实感与语义保持情况。

提出的方法

通过一个与 VQ-VAE 类似的超先验集成潜在扩散模型（LDM）编码器，将图像编码为局部与全局潜在变量。
对超潜在变量进行量化并以统一码传输，形成比特流。
将基于扩散的解码器同时以量化的局部特征和无损传输的文本字幕作为条件；对文本条件应用交叉注意力。
以包含扩散-失真项和可选的 LPIPS 感知损失的扩散重建损失进行训练；推理时使用带引导尺度的分类器无关引导。
使用一个预训练的文本条件扩散模型，冻结自编码器权重，仅对超编码器和扩散组件在 OpenImages 上进行微调。

实验结果

研究问题

RQ1一个以文本和局部视觉上下文为条件的扩散解码器，是否能够在 ultra-low bitrate（低至 0.003 bpp）实现真实感重构？
RQ2将向量量化的潜在表示与全局字幕相结合，是否在低码率下提升真实感和语义保持？
RQ3在不同码率下，PerCo 相较基线在真实感度量（FID/KID）和语义度量（CLIP、mIoU）上的表现如何？
RQ4条件化模态（文本 vs. 空间）和分类器自由引导对重建质量有何影响？
RQ5在更高分辨率下存在哪些局限，哪些消融实验揭示 PerCo 的瓶颈？

主要发现

PerCo 在 Kodak/MS-COCO 30k 的 0.0032 bpp 实现了现实感重建，在低码率下达到最先进的 FID 与 KID 分数。
PerCo 的 FID 与 KID 曲线在不同码率上更为平坦，表明真实感与码率之间呈解耦。
语义相关指标（CLIP、mIoU）相对基线有改善，尤其在较低码率时尤为明显。
消融结果显示文本条件化与空间条件化均对 FID 与 mIoU 有提升作用；真实字幕在趋势上与 BLIP/IDEFICS 字幕相似。
量化瓶颈，而非扩散模型，是性能的主要决定因素；LDM 自编码器提供了显著提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。