QUICK REVIEW

[論文レビュー] COIN++: Neural Compression Across Modalities

Emilien Dupont, Hrushikesh Loya|arXiv (Cornell University)|Jan 30, 2022

Generative Adversarial Networks and Image Synthesis被引用数 29

ひとこと要約

COIN++は、共通の基盤ネットワークとインスタンス固有のモジュレーションを用いた暗黙的ニューラル表現を用いた統一的なニューラル圧縮フレームワークを導入し、画像から気候データまでのマルチモーダルデータ圧縮を可能にする高速エンコードを実現します。

ABSTRACT

Neural compression algorithms are typically based on autoencoders that require specialized encoder and decoder architectures for different data modalities. In this paper, we propose COIN++, a neural compression framework that seamlessly handles a wide range of data modalities. Our approach is based on converting data to implicit neural representations, i.e. neural functions that map coordinates (such as pixel locations) to features (such as RGB values). Then, instead of storing the weights of the implicit neural representation directly, we store modulations applied to a meta-learned base network as a compressed code for the data. We further quantize and entropy code these modulations, leading to large compression gains while reducing encoding time by two orders of magnitude compared to baselines. We empirically demonstrate the feasibility of our method by compressing various data modalities, from images and audio to medical and climate data.

研究の動機と目的

Motivate a neural compression framework that works across diverse data modalities beyond images.
Eliminate per-datapoint encoders/decoders by sharing a base network and encoding instances via modulations.
Drastically reduce encoding time while maintaining competitive reconstruction quality.
Demonstrate applicability to images, audio, medical, and climate data.
Explore quantization and entropy coding of modulations to maximize compression.

提案手法

Convert each data instance to an implicit neural representation (INR) mapping coordinates to features.
Use a fixed base network and learn per-instance modulations (FiLM-like) to parameterize each data instance.
Meta-learn the base network initialization so that a few gradient steps recover the modulations for a new datapoint.
Apply only shifts in FiLM modulations and linearly map a latent vector to these modulations to improve stability and compressibility.
Partition large data into patches during training and testing to manage memory and scaling.
Quantize modulations with uniform quantization and apply simple entropy coding based on observed modulation distributions.

実験結果

リサーチクエスチョン

RQ1Can COIN++ compress a wide range of data modalities beyond images (e.g., audio, medical, climate data)?
RQ2Does sharing a base INR with per-instance modulations improve compression and encoding speed compared with COIN?
RQ3How do quantization and entropy coding of modulations affect rate–distortion performance?
RQ4What is the impact of patch-based training and testing on scalability and reconstruction quality?
RQ5How close can COIN++ approach state-of-the-art codecs across modalities?

主な発見

Codec	Encoding (ms)	Decoding (ms)
BPG	5.19	1.25
COIN	29700	0.46
COIN++	94.9	1.29

COIN++ vastly outperforms COIN and JPEG/JPEG2000 on CIFAR10 and approaches, but does not fully reach, BPG performance at low bitrates.
Using modulations with a fixed base network yields better compressibility (2 dB PSNR gain at the same parameter count) than alternative INR parameterizations.
Quantizing modulations to 5–6 bits provides strong rate–distortion gains with modest PSNR loss; modulation quantization is more robust than weight quantization in COIN++.
COIN++ encodes images about 300× faster than COIN and achieves roughly 4× better compression on CIFAR10 (per the reported encoding times for CIFAR10).
On ERA5 climate data, COIN++ achieves a 3000× compression rate with competitive RMSE, outperforming baselines like JPEG/JPEG2000/BPG on this modality.
With patch-based training, COIN++ scales to large data domains while maintaining better performance than non-patched baselines at low bitrates.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。