QUICK REVIEW

[論文レビュー] Lossy Image Compression with Conditional Diffusion Models

Ruihan Yang, Stephan Mandt|arXiv (Cornell University)|Sep 14, 2022

Generative Adversarial Networks and Image Synthesis被引用数 30

ひとこと要約

エンドツーエンドのロスィー画像圧縮フレームワークを提案します。変換符号化アーキテクチャ内で条件付き拡散モデルをデコーダとして用い、レート–歪み–知覚のトレードオフと X-prediction による高速デコードを実現します。

ABSTRACT

This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional ``content'' latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining ``texture'' variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with $\mathcal{X}$-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality. Our code is available at: \url{https://github.com/buggyyang/CDC_compression}

研究の動機と目的

デコーダとして条件付き拡散モデルを用いる、変換符号化ベースのロスィー画像圧縮フレームワークを導入する。
画像をコンテンツ潜在変数にエンコードし、復元のための拡散過程を条件付ける。
レート、歪み、知覚品質の間で調整可能なトレードオフを有効にする。
競争力のある歪み性能を維持しつつ、知覚指標（例：FID）の改善を示す。

提案手法

z のエントロピー符号化のため、二レベルの階層事前分布を用いて画像をコンテキスト潜在 z にエンコードする。
デコーダ時に z を条件としてノイズ除去拡散過程をデコーディングして x0 を再構成し、同時にテクスチャ変数 x1:N をデコード時に合成する。
拡散モデルの内在的な rate–distortion 関数に対する変分上界を導出し、訓練目的関数を形成する。
X-prediction を用いて少数のステップで高速デコードを実現し、必要に応じて gamma-noise パラメータによる確率的デコードを有効にする。
rho 重み付きの組み合わせによって、レート-歪み-知覚をトレードオフする知覚損失を任意に追加する。
DDIM ベースのデコード方式を提供し、知覚制御のための決定論的デコードと確率的デコードを議論する。

実験結果

リサーチクエスチョン

RQ1拡散ベースのデコーダは、学習済み画像圧縮における知覚品質を改善しつつ、競争力のあるレート-歪み性能を維持できるか。
RQ2コンテンツ潜在変数で拡散過程を条件付けることは、実践的に知覚–歪みのトレードオフを制御可能にするか。
RQ3X-prediction は比較的少ないデコードステップで高品質な再構成を実現できるか。
RQ4確率的デコードと知覚損失が FID や LPIPS のような指標に与える影響は何か。
RQ5提案手法は、複数のデータセットと指標において GAN ベースおよび VAE ベースのコーデックとどう比較されるか。

主な発見

CDC は比較対象の方法の中で最も良い報告済みの FID スコアを出し、いくつかの歪み指標で VAE ベースのモデルと競合する。
少数のデコードステップで高品質な再構成を効率的に得られる X-prediction 変種。
決定論的デコードは歪み指標を有利にし、適切な設定下で確率的デコードは知覚指標を改善する。
知覚トレードオフパラメータ rho を変えると、三者のレート–歪み–知覚トレードオフを実現できる。
この手法は知覚品質で GAN ベースのベースラインを上回りつつ、競争力のあるレート-歪み性能を維持できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。