QUICK REVIEW

[论文解读] Theory and Experiments on Vector Quantized Autoencoders

Aurko Roy, Ashish Vaswani|arXiv (Cornell University)|May 28, 2018

Generative Adversarial Networks and Image Synthesis参考文献 27被引用 57

一句话总结

该论文通过一种受 EM 启发的方法改进 VQ-VAE 训练，在 CIFAR-10 的图像生成上取得更好效果，并通过蒸馏实现快速、非自回归的翻译模型，达到接近自回归 Transformer 的性能。

ABSTRACT

Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks. There has been a surge in interest in discrete latent variable models, however, despite several recent improvements, the training of discrete latent variable models has remained challenging and their performance has mostly failed to match their continuous counterparts. Recent work on vector quantized autoencoders (VQ-VAE) has made substantial progress in this direction, with its perplexity almost matching that of a VAE on datasets such as CIFAR-10. In this work, we investigate an alternate training technique for VQ-VAE, inspired by its connection to the Expectation Maximization (EM) algorithm. Training the discrete bottleneck with EM helps us achieve better image generation results on CIFAR-10, and together with knowledge distillation, allows us to develop a non-autoregressive machine translation model whose accuracy almost matches a strong greedy autoregressive baseline Transformer, while being 3.3 times faster at inference.

研究动机与目标

为符号推理和数据压缩动机化离散潜在表示。
探索超越现有启发式方法的 VQ-VAE 离散瓶颈训练策略。
利用 EM 启发的更新来改进离散潜在码的学习。
通过 EM 和蒸馏展示图像生成和机器翻译的提升。

提出的方法

描述具有离散瓶颈和最近邻编码本查找的 VQ-VAE。
将硬 EM 与 K-means 与 VQ-VAE 更新以及基于 EMA 的编码本学习建立联系。
引入带蒙特卡罗 EM 更新的软 EM，以对离散潜在变量进行近似推断。
在学习到的离散潜在变量上自回归地训练 Latent Predictor，并通过解码器进行解码。
将序列级蒸馏知识应用于改进无自回归翻译。
在 CIFAR-10 上进行非条件图像生成评价，以及在 WMT English-German 上进行有监督翻译评价。

实验结果

研究问题

RQ1与以往启发式方法相比，EM 启发的训练是否能改善 VQ-VAE 离散潜在变量的学习？
RQ2带蒙特卡罗更新的软 EM 是否在 VQ-VAE 中提供比硬 EM 更稳定、质量更高的学习？
RQ3在图像生成任务（CIFAR-10）和翻译任务（WMT English-German）上，EM 训练的 VQ-VAE 相对于自回归与非自回归基线的表现如何？
RQ4码本大小和蒸馏对翻译质量与解码速度的影响是什么？

主要发现

模型	码本大小	BLEU	延迟	加速比
Autoregressive Model (beam size=4)	-	28.1	331 ms	1x
Autoregressive Baseline (no beam-search)	-	27.0	265 ms	1.25x
NAT + distillation	-	17.7	39 ms	15.6x
NAT + distillation + NPD=10	-	18.7	79 ms	7.68x
NAT + distillation + NPD=100	-	19.2	257 ms	2.36x
LT + Semhash	-	19.8	105 ms	3.15x
Our Results \| VQ-VAE	-	21.4	81 ms	4.08x
VQ-VAE with EM	-	22.4	81 ms	4.08x
VQ-VAE + distillation	-	26.4	81 ms	4.08x
VQ-VAE with EM + distillation	-	26.7	81 ms	4.08x
VQ-VAE with EM + distillation \| n_c=4	-	25.4	58 ms	5.71x

EM 训练在 CIFAR-10 图像生成方面优于不使用乘积量化的基线 VQ-VAE。
在 WMT14 English-German 上，带蒸馏的 EM 达到 BLEU 26.7，接近贪婪 Transformer（27.0），并且快 3.3x。
带蒙特卡罗更新的软 EM 在翻译实验中比硬 EM 产生更稳定且更高的 BLEU。
在测试的大小中，码本大小为 2^12 表现出最佳翻译 BLEU；更大尺寸并未提升结果。
结合 EM 与蒸馏的非自回归翻译实现了有竞争力的 BLEU，同时显著降低解码延迟（如 81 ms 对比自回归的 331 ms）。
带 EM 的 VQ-VAE 在 WMT14 上实现 BLEU 22.4，带 EM，及带蒸馏后 BLEU 26.7，展示了所提训练的强大收益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。