QUICK REVIEW

[論文レビュー] Joint Autoregressive and Hierarchical Priors for Learned Image Compression

David Minnen, Johannes Ballé|arXiv (Cornell University)|Sep 8, 2018

Advanced Data Compression Techniques参考文献 26被引用数 353

ひとこと要約

本論文は、自己回帰的文脈モデルと階層的ハイパリオリティを組み合わせた learned 画像圧縮を拡張し、最先端の rate–distortion 性能を達成するとともに、PSNR および MS-SSIM で BPG を上回る。

ABSTRACT

Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a way to exploit more structure in the latents than simple fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, as well as combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate--distortion performance, providing a 15.8% average reduction in file size over the previous state-of-the-art method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. To the best of our knowledge, our model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.

研究の動機と目的

GSMベースのエントロピー模型を、条件付きガウス混合分布と自己回帰的文脈を用いて、学習済み画像圧縮へ拡張する。
自己回帰事前分布と階層的ハイパリオティをエントロピーモデリングにおける補完的な利点を調査する。
PSNR および MS-SSIM で標準コーデックおよび既存の学習法と比較した rate–distortion 性能を評価する。

提案手法

尺度ハイパリオリティモデルを、ハイパリオと因果文脈に条件付けて平均とスケールの両方を予測するガウス混合モデルへ一般化する。
自己回帰的文脈モデルとハイパリオを結合し、各潜在量に対して平均とスケールを出力する結合エントロピーモデルを形成する。
潜在表現のための基礎オートエンコーダと、エントロピー符号化のための確率的モデル（文脈＋ハイパリオ）という2部構成のニューラルアーキテクチャを用いる。
潜在量およびハイパ潜在量のコストと二乗誤差による歪みを含む rate–distortion 目的で訓練する。
5×5 のマスク付き畳み込みに基づく文脈モデルを適用して潜在量の因果依存性を捉え、Gaussian パラメータを予測する Entropy Parameters ネットワークは維持する。
Kodak で RD 性能を評価し（PSNR および MS-SSIM）、標準コーデック（BPG、JPEG、JPEG2000、WebP）および既存の学習法と比較する。

実験結果

リサーチクエスチョン

RQ1自己回帰潜在事前分布は、ハイパリオリティと組み合わせると圧縮性能を改善できるか？
RQ2ガウス混合エントロピー模型は、スケールハイパリオリティと比較して rate–distortion にどのような差をもたらすか？
RQ3自己回帰文脈と階層的事前分布を統合することで、学習済み画像圧縮にどのような改善が生じるか？

主な発見

自己回帰とハイパリオリティを組み合わせた文脈モデルは、最先端の rate–distortion 性能を達成する。
結合モデルは、前例のある学習法より平均ファイルサイズを 15.8% 削減する。
この改善は JPEG に対して約 59.8% のサイズ削減に相当する。
本手法は WebP および JPEG2000 に対して 35% 以上の削減を達成する。
ビットストリームは BPG より 8.4% 小さく、当時の最先端コーデックに匹敵する。
著者らの知る限り、本手法は PSNR と MS-SSIM の両方で BPG を上回る初の学習ベース手法である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。