QUICK REVIEW

[論文レビュー] ControlVAE: Controllable Variational Autoencoder

Huajie Shao, Shuochao Yao|ArXiv.org|Apr 13, 2020

Generative Adversarial Networks and Image Synthesis参考文献 40被引用数 38

ひとこと要約

ControlVAE は、非線形 PI コントローラを導入して VAEs における KL-divergence を動的に調整し、言語モデリング、分離表現、画像生成における再構成品質、分離、多様性を明示的に制御できるようにする。

ABSTRACT

Variational Autoencoders (VAE) and their variants have been widely used in a variety of applications, such as dialog generation, image generation and disentangled representation learning. However, the existing VAE models have some limitations in different applications. For example, a VAE easily suffers from KL vanishing in language modeling and low reconstruction quality for disentangling. To address these issues, we propose a novel controllable variational autoencoder framework, ControlVAE, that combines a controller, inspired by automatic control theory, with the basic VAE to improve the performance of resulting generative models. Specifically, we design a new non-linear PI controller, a variant of the proportional-integral-derivative (PID) control, to automatically tune the hyperparameter (weight) added in the VAE objective using the output KL-divergence as feedback during model training. The framework is evaluated using three applications; namely, language modeling, disentangled representation learning, and image generation. The results show that ControlVAE can achieve better disentangling and reconstruction quality than the existing methods. For language modelling, it not only averts the KL-vanishing, but also improves the diversity of generated text. Finally, we also demonstrate that ControlVAE improves the reconstruction quality of generated images compared to the original VAE.

研究の動機と目的

Motivate and enable explicit control over the KL-divergence trade-off in VAEs to suit different applications.
Develop a controllable VAE framework that uses a nonlinear PI controller to tune the KL weight during training.
Demonstrate improvements in reconstruction quality, disentanglement, and text/image generation across multiple datasets.

提案手法

Introduce a nonlinear PI controller to automatically tune the KL-weight beta(t) in the VAE objective using the output KL-divergence as feedback.
Define the control objective as stabilizing KL-divergence at a user-specified set point v_kl while optimizing a VAE reconstruction term.
Formulate beta(t) with a nonlinear PI control law: beta(t) = K_p / (1 + exp(e(t))) - K_i sum_{j=0}^t e(j) + beta_min, where e(t) = v_kl - hat{v}_{kl}(t).
Incorporate anti-windup and bounded beta(t) to ensure stable training.
Apply ControlVAE to language modeling, disentangled representation learning, and image generation to evaluate trade-offs between reconstruction, KL-divergence, and task-specific metrics.

実験結果

リサーチクエスチョン

RQ1Can ControlVAE prevent KL-vanishing in language modeling while improving generation diversity?
RQ2Does automatic, feedback-driven tuning of KL-divergence lead to better disentanglement without sacrificing reconstruction quality?
RQ3Can a controllable KL-divergence improve image generation reconstruction quality without overwhelming the latent prior?
RQ4How does ControlVAE compare to fixed-beta VAE variants (beta-VAE, FactorVAE) in terms of MIG/disentanglement, reconstruction loss, and generative metrics?

主な発見

ControlVAE averts KL-vanishing in language modeling and improves generated data diversity compared to baselines.
ControlVAE achieves lower reconstruction loss than cyclical annealing and maintains stable KL-divergence at a chosen set point.
In disentangled representation learning, ControlVAE attains comparable or better MIG scores with lower variance than Beta-VAE and compares favorably to FactorVAE.
For image generation on CelebA, ControlVAE with higher KL-divergence (e.g., KL-200) yields better reconstruction and improves FID/SSIM relative to the original VAE.
Across tasks, ControlVAE demonstrates adjustable trade-offs between reconstruction accuracy and KL-divergence with automatic beta(t) tuning.
Qualitative results show ControlVAE can disentangle multiple latent factors and maintain diverse yet authentic outputs.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。