QUICK REVIEW

[论文解读] NVAE: A Deep Hierarchical Variational Autoencoder

Arash Vahdat, Jan Kautz|arXiv (Cornell University)|Jul 8, 2020

Generative Adversarial Networks and Image Synthesis参考文献 79被引用 378

一句话总结

NVAE 设计了一种具有深层分层 VAE 的结构，使用深度卷积可分离卷积和残差后验参数化，在多种图像数据集上实现了最先进的非自回归似然，同时支持大规模图像生成。

ABSTRACT

Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$ imes$256 pixels. The source code is available at https://github.com/NVlabs/NVAE .

研究动机与目标

设计一个针对高质量图像生成而定制的深层分层 VAE 架构。
在具有大量潜在分组的非常深的 VAE 训练中实现稳定性。
提高大尺寸图像的内存效率和采样速度。

提出的方法

在生成模型中引入深度可分离卷积，以高效扩展感受野。
相对于先验使用近似后验的残差参数化以稳定 KL 项。
应用谱正则化以界定 Lipschitz 常数并稳定训练。
结合带有调优动量的批量归一化以及 BN-激活对齐以提升训练稳定性。
采用混合精度训练与梯度检查点以降低内存使用。
可选在编码器中应用轻量级正规流以提升后验表达能力。

实验结果

研究问题

RQ1一个经过精心设计的深层分层 VAE 是否能在标准图像数据集上胜过现有基于非自回归似然的模型？
RQ2哪些结构性选择（卷积、归一化、激活、残差参数化）最能提升大尺寸图像的 VAE 训练稳定性？
RQ3内存与计算节省技术在 256×256 分辨率下对深层 VAE 的训练与采样效率有何影响？
RQ4在编码器中添加正规流是否能在不牺牲稳定性的前提下显著提升保留集对数似然？

主要发现

方法	MNIST	CIFAR-10	ImageNet	CelebA	CelebA HQ	FFHQ
NVAE w/o flow	78.01	2.93	-	2.04	-	0.71
NVAE w/ flow	78.19	2.91	3.92	2.03	0.70	0.69

在 MNIST、CIFAR-10、CelebA 64 以及 CelebA HQ-256 上，NVAE 在非自回归似然基模型中达到最先进的结果，并且是 FFHQ-256 的强基线。
在 CIFAR-10 上，NVAE 的每维比特数从 2.98 提升到 2.91。
NVAE 能生成高质量的 256×256 图像，并且是第一批在不改变标准 VAE 目标函数的情况下做到这一点的 VAE 之一。
由于无条件解码器，采样速度很快，在 Titan V GPU 上每张图像 56 ms（批量大小 36）。
消融实验表明带 Swish 激活与 SE 的 BN、深度可分离的生成单元、SR 以及残差后验参数化都对性能和稳定性有贡献。
内存优化技术（混合精度和梯度检查点）大致将训练吞吐量翻倍。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。