[论文解读] Probabilistic Learning and Generation in Deep Sequence Models
一篇将概率贝叶斯方法与深度序列模型相结合的博士论文,提出稀疏高斯过程注意力、具有HiPPO记忆的在线跨域高斯过程,以及自监督潜在信号以增强序列生成模型的能力。
Despite exceptional predictive performance of Deep sequence models (DSMs), the main concern of their deployment centers around the lack of uncertainty awareness. In contrast, probabilistic models quantify the uncertainty associated with unobserved variables with rules of probability. Notably, Bayesian methods leverage Bayes' rule to express our belief of unobserved variables in a principled way. Since exact Bayesian inference is computationally infeasible at scale, approximate inference is required in practice. Two major bottlenecks of Bayesian methods, especially when applied in deep neural networks, are prior specification and approximation quality. In Chapter 3 & 4, we investigate how the architectures of DSMs themselves can be informative for the design of priors or approximations in probabilistic models. We first develop an approximate Bayesian inference method tailored to the Transformer based on the similarity between attention and sparse Gaussian process. Next, we exploit the long-range memory preservation capability of HiPPOs (High-order Polynomial Projection Operators) to construct an interdomain inducing point for Gaussian process, which successfully memorizes the history in online learning. In addition to the progress of DSMs in predictive tasks, sequential generative models consisting of a sequence of latent variables are popularized in the domain of deep generative models. Inspired by the explicit self-supervised signals for these latent variables in diffusion models, in Chapter 5, we explore the possibility of improving other generative models with self-supervision for their sequential latent states, and investigate desired probabilistic structures over them. Overall, this thesis leverages inductive biases in DSMs to design probabilistic inference or structure, which bridges the gap between DSMs and probabilistic models, leading to mutually reinforced improvement.
研究动机与目标
- 在深度序列模型中利用归纳偏差设计概率推断与结构。
- 开发在Transformer架构中校准不确定性的方法。
- 在在线学习中使用受HiPPO启发的跨域高斯过程实现对长期历史的记忆。
- 研究在潜在状态上的自监督信号以改进序列生成模型。
提出的方法
- 通过将缩放点积注意力替换为稀疏高斯过程注意力(SGPA)来校准Transformer。
- 将注意力表示为稀疏变分高斯过程的均值,并通过解耦的SGPA变体解决低效问题。
- 引入在线HiPPO稀疏变分高斯过程(OHSVGP),以在在线/持续学习场景中捕捉长期记忆。
- 将HiPPO扩展为跨域诱导变量,并通过ODE演化在线更新核矩阵。
- 探索伪视频生成以将自监督信号注入序列生成模型的潜在状态中。
实验结果
研究问题
- RQ1Transformer中的注意力如何以概率高斯过程后验为基础,以提高标定和鲁棒性?
- RQ2具有HiPPO记忆的在线稀疏GP是否能在序列数据与连续学习中保留长期信息?
- RQ3来自伪视频的自监督信号是否能改善序列生成模型的潜在状态表征?
主要发现
- 基于SGPA的Transformer在不确定性标定与抗分布外鲁棒性方面表现出色,同时保持了竞争性准确性。
- 在线HiPPO稀疏变分GP在在线与持续学习任务中提供了改进的长期记忆和效率。
- 跨域HiPPO诱导点使在线核更新成为可能,并将GP记忆扩展至跨时间的区域。
- 自监督的伪视频在像VQ-VAE和基于扩散的生成方法中提升了重建与生成质量。
- 第5章表明,通过来自伪视频的自监督信号丰富潜在状态,能够在CIFAR10和CelebA上提升重建与生成性能。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。