[論文レビュー] From Variational to Deterministic Autoencoders
RAEsは、確率的エンコードを明示的デコーダ正則化に置き換えることでVAEsに対する決定論的な代替手段を提供し、画像や分子のような構造化データを含むサンプリング品質を改善する事後密度推定ステップを追加します。
Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of VAEs. We observe that sampling a stochastic encoder in a Gaussian VAE can be interpreted as simply injecting noise into the input of a deterministic decoder. We investigate how substituting this kind of stochasticity, with other explicit and implicit regularization schemes, can lead to an equally smooth and meaningful latent space without forcing it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism to sample new data, we introduce an ex-post density estimation step that can be readily applied also to existing VAEs, improving their sample quality. We show, in a rigorous empirical study, that the proposed regularized deterministic autoencoders are able to generate samples that are comparable to, or better than, those of VAEs and more powerful alternatives when applied to images as well as to structured data such as molecules. \footnote{An implementation is available at: \url{https://github.com/ParthaEth/Regularized_autoencoders-RAE-}}
研究の動機と目的
- Question the necessity of the variational framework for generative modeling and latent space regularization.
- Propose a deterministic autoencoder (RAE) with explicit regularizers to replace the KL-based VAE objective.
- Investigate how different regularizers affect latent space structure and sample quality.
- Introduce an ex-post density estimation step to enable sampling without a fixed latent prior.
- Demonstrate RAEs on image datasets and on structured domains like molecules to compare with VAEs and WAEs.
提案手法
- Reinterpret VAEs as deterministic autoencoders with Gaussian input noise added to the decoder input via the reparameterization trick.
- Define the Regularized Autoencoder loss LRAE = LREC + (1/2)||z||^2 + βLREG, with LREG as explicit decoder regularization (e.g., L2 on θ, gradient penalties, spectral normalization).
- Explore multiple regularizers: L2 on decoder parameters (RAE-L2), GP (gradient penalty), and SN (spectral normalization).
- Optionally omit LRAE_Z and rely on LREG alone to regularize the decoder, enabling an entirely deterministic training signal.
- Apply ex-post density estimation qδ(z) on learned latent codes to recover a generative mechanism without enforcing a fixed prior.
- Evaluate via Fréchet Inception Distance (FID), reconstructions, and interpolations on MNIST, CIFAR, and CelebA, and extend to GrammarRAE for structured data (molecules, expressions).
実験結果
リサーチクエスチョン
- RQ1Can a deterministic autoencoder with explicit decoder regularization match or exceed VAE/WAE sample quality?
- RQ2Does removing the KL term and latent prior hinder sampling, and can ex-post density estimation restore a usable generative mechanism?
- RQ3Which regularization schemes (L2, gradient penalty, spectral normalization) most effectively shape the latent space?
- RQ4Is ex-post density estimation beneficial across VAEs, WAEs, and RAEs for improving samples?
- RQ5Do RAEs extend well to structured data domains like molecules and grammar-constrained expressions?
主な発見
- RAEs achieve competitive or better Fréchet Inception Distance (FID) scores than VAEs, WAEs, and 2sVAE on MNIST, CIFAR, and CelebA when augmented with a 10-component GMM ex-post density estimator.
- Different RAEs (GP, L2, SN) perform similarly across datasets, with no single winner; simpler L2 variant is preferred for ease of implementation.
- Ex-post density estimation consistently improves sample quality across VAEs, WAEs, and RAEs, reducing FID notably (e.g.,~20 to ~10 on MNIST and CelebA when using a 10-component GMM).
- Implicitly regularized RAEs and even plain AEs with qδ(z) fitted by GMMs can achieve strong FID reductions (e.g., MNIST from 58.73 to 10.66).
- RAEs demonstrate strong latent space interpolation and sharp samples, and extend effectively to structured data like molecules, achieving higher validity and better scores than CVAEs and GVAE in GrammarRAE experiments.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。