QUICK REVIEW

[論文レビュー] Overfitting for Fun and Profit: Instance-Adaptive Data Compression

Ties van Rozendaal, Iris A. M. Huijben|TU/e Research Portal|Jan 21, 2021

Video Coding and Compression Technologies参考文献 30被引用数 28

ひとこと要約

この論文は、全モデルを単一ビデオの I-frames でファインチューニングし、量子化されたモデル更新を signaling することにより、 encoder-only ファインチューニングと同じビ bitrate で約1 dBの PSNR 改善を達成する、全モデルインスタンス適応型ニューラルデータ圧縮を提案する。

ABSTRACT

Neural data compression has been shown to outperform classical methods in terms of $RD$ performance, with results still improving rapidly. At a high level, neural compression is based on an autoencoder that tries to reconstruct the input instance from a (quantized) latent representation, coupled with a prior that is used to losslessly compress these latents. Due to limitations on model capacity and imperfect optimization and generalization, such models will suboptimally compress test data in general. However, one of the great strengths of learned compression is that if the test-time data distribution is known and relatively low-entropy (e.g. a camera watching a static scene, a dash cam in an autonomous car, etc.), the model can easily be finetuned or adapted to this distribution, leading to improved $RD$ performance. In this paper we take this concept to the extreme, adapting the full model to a single video, and sending model updates (quantized and compressed using a parameter-space prior) along with the latent representation. Unlike previous work, we finetune not only the encoder/latents but the entire model, and - during finetuning - take into account both the effect of model quantization and the additional costs incurred by sending the model updates. We evaluate an image compression model on I-frames (sampled at 2 fps) from videos of the Xiph dataset, and demonstrate that full-model adaptation improves $RD$ performance by ~1 dB, with respect to encoder-only finetuning.

研究の動機と目的

単一データインスタンスに対して圧縮モデル全体を適応させることで、レート-歪み性能を向上させる動機づけ。
RD 損失を拡張し、モデル更新コストと量子化オーバーヘッドを含める。
スパイク・アンド・スラブ prior を用いた全モデルファインチューニングが I-frames の歪みを改善しつつビットレートを低減することを実証する。
パラメータ間でのモデル更新の分布と、量子化が性能に与える影響を分析する。

提案手法

更新分布 p(delta) に基づくモデル更新コスト項 M を含む、RD とモデルレートの結合損失 L_RDM を定式化する。
スパイク・アンド・スラブ prior を用いて疎性を促進し、ゼロ更新の signaling コストを削減する。
ファインチューニング時の勾配推定には Straight-Through Estimation を用い、デルタをビン幅 t で量子化する。
潜在変数 z と量子化された更新 delta を、事前分布 p_theta(z) と p([delta]) を用いたエントロピー符号化で符号化する。
単一の I-frames でグローバルモデルをファインチューニングする（全モデル適応）ことで、動画内の複数フレームに渡ってモデルレートコストを払い分散する。

実験結果

リサーチクエスチョン

RQ1全モデルファインチューニングを単一ビデオインスタンスに対して行うことは、encoder-only ファインチューニングや潜在最適化のみの適用と比較して RD 性能を改善するか。
RQ2モデル更新コストと量子化対応の訓練を取り入れることが、インスタンス適応圧縮の実用性と利得にどのように影響するか。
RQ3I-frames に適応する際、パラメータグループ間でのモデル更新の分布はどうなるか、スパイク・アンド・スラブ prior は signaling コストにどう影響するか。
RQ4β の設定が異なる場合にどの程度の RD 増分が得られ、ファインチューニング中にどのように変化するか。

主な発見

全モデルインスタンス適応ファインチューニングは、Xiph-5N 2fps I-frames に対して encoder-only finetuning より同じビットレートで約 1 dB の RD 増益をもたらす。
モデル更新コストと量子化を考慮することが不可欠で、それを無視するとビットレートの悪化や発散的な増加につながる。
スパイク・アンド・スラブ prior はゼロ更新の signaling コストを削減し、疎性を促進し、パラメータを更新すべきかを導く。
ほとんどの RD 増分はファインチューニングの初期段階で生じ、持続する。高ビットレート領域では、より効果的なファインチューニングにより潜在的なビット削減が大きくなる。
ビット割り当て分析では、更新はしばしば量子化され、量子化器によって上限設定される一方、ゼロ更新は小さな静的コストを伴う。
Encoder-only finetuning は直接潜在的最適化と競合可能であり、これらの実験では小さな償却ギャップを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。