QUICK REVIEW

[論文レビュー] Lossy Image Compression with Compressive Autoencoders

Lucas Theis, Wenzhe Shi|arXiv (Cornell University)|Mar 1, 2017

Advanced Data Compression Techniques参考文献 30被引用数 259

ひとこと要約

この論文は、量子化の微分不可能性とエントロピーのモデリングを勾配対応の代替手段で解決し、JPEG 2000と競合する効率的な高解像度デコードを実現するエンドツーエンドの lossy 画像圧縮のための圧縮型自己符号化器（CAEs）を提案する。

ABSTRACT

We propose a new approach to the problem of optimizing autoencoders for lossy image compression. New media formats, changing hardware technology, as well as diverse requirements and content types create a need for compression algorithms which are more flexible than existing codecs. Autoencoders have the potential to address this need, but are difficult to optimize directly due to the inherent non-differentiabilty of the compression loss. We here show that minimal changes to the loss are sufficient to train deep autoencoders competitive with JPEG 2000 and outperforming recently proposed approaches based on RNNs. Our network is furthermore computationally efficient thanks to a sub-pixel architecture, which makes it suitable for high-resolution images. This is in contrast to previous work on autoencoders for compression using coarser approximations, shallower architectures, computationally expensive methods, or focusing on small images.

研究の動機と目的

伝統的なコーデックを超えた柔軟なロス圧縮の必要性を動機づける。
学習可能なフレームワーク（CAE）を提案し、レート-歪みを共同最適化する。
微分可能な戦略を開発して量子化とエントロピー符号化を扱う。
標準データセットでの競争力ある性能を示し、知覚品質を分析する。

提案手法

encoder f、decoder g、およびエントロピーモデル Q を用いる圧縮型自己符号化器を定義する。
微分可能な近似を用いて -log2 Q([f(x)]) + beta * d(x, g([f(x)])) のレート-歪み目的を最適化する。
バックプロパゲーションのための非微分可能な丸めベースの量子化勾配を、単純な微分可能な代替で置換する。
連続密度 q とジャンセンの不等式を用いて非微分可能なビットコストを上界し、勾配ベースの訓練を可能にする。
エンコード係数の分布をエントロピ符号化のためにガウススケール混合でモデル化する。
サブピクセル畳み込みアーキテクチャと増分/微調整訓練を用いて高解像度性能を効率的に達成する。
レート-歪みトレードオフ全体でビットレートを調整するスケールパラメータを学習して柔軟なビットレート制御を提供する。

Figure 1: Effects of rounding and differentiable alternatives when used as replacements in JPEG compression. A : A crop of an image before compression (GoToVan, 2014 ) . B : Blocking artefacts in JPEG are caused by rounding of DCT coefficients to the nearest integer. Since rounding is used at test t

実験結果

リサーチクエスチョン

RQ1CAEs は natural images に対して JPEG 2000 および RNN ベースの手法と比較して競争力のあるレート-歪み性能を達成できるか。
RQ2エンドツーエンド訓練中に非微分可能な量子化とエントロピー符号化をどのように効果的に処理できるか。
RQ3効率的なアーキテクチャ（サブピクセルアップサンプリング）は高解像度画像のほぼリアルタイムデコードを可能にするか。
RQ4増分訓練とレート-歪み設定間のファインチューニングは安定性と性能を向上させるか。
RQ5CAEs は SSIM、MS-SSIM、MOS のような知覚品質指標で標準コーデックと比較してどのように動作するか。

主な発見

CAE は Kodak 画像に対して SSIM および MOS のような知覚指標で JPEG 2000 と同等以上の性能を達成する。
CAEs は特定のビットレートで JPEG 2000 より SSIM および MOS で上回り、JPEG 2000 より滑らかなアーティファクトを提供する。
サブピクセルアップサンプリングを備えた効率的な畳み込みアーキテクチャにより、家庭用ハードウェアでの高解像度デコードを実現する。
増分訓練と学習されたスケールパラメータは、多数の個別モデルを訓練することなく、ビットレートの柔軟で細かな制御を提供する。
エンドツーエンドの最適化により、CAEs は伝統的なコーデックを超えたコンテンツ固有のタスクや指標に適応できる。

Figure 2: Illustration of the compressive autoencoder architecture used in this paper. Inspired by the work of Shi et al. ( 2016 ) , most convolutions are performed in a downsampled space to speed up computation, and upsampling is performed using sub-pixel convolutions (convolutions followed by resh

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。