[论文解读] A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning
本文在双曲空间上引入了一种伪双曲高斯分布,该分布可解析的密度估计和梯度计算,从而使基于梯度的概率模型成为可能,如 Hyperbolic VAE 和概率词向量嵌入。它在 MNIST、Atari Breakout 的轨迹以及 WordNet 的词向量嵌入上显示出性能提升。
Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called extit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribution enables the gradient-based learning of the probabilistic models on hyperbolic space that could never have been considered before. Also, we can sample from this hyperbolic probability distribution without resorting to auxiliary means like rejection sampling. As applications of our distribution, we develop a hyperbolic-analog of variational autoencoder and a method of probabilistic word embedding on hyperbolic space. We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and WordNet.
研究动机与目标
- 动机:在分层数据表示和概率建模中使用双曲几何。
- 在双曲空间上定义一种高斯样分布,具有解析密度和可微性。
- 使在双曲空间上对概率模型(如 VAE、词嵌入)进行基于梯度的训练成为可能。
- 提供高效采样且无需拒绝采样。
- 在基准数据集上演示该方法(MNIST、Atari Breakout、WordNet)。
提出的方法
- 通过在原点切空间从欧几里得高斯采样、再并行传输到目标位置、并通过双曲空间洛伦兹模型的指数映射投影来构造一个伪双曲高斯。
- 使用投影映射的对数行列式来计算对数密度,该行列式分解为指数映射和并行传输的行列式,二者均可闭式求值。
- 在洛伦兹模型中提供切空间运算(并行传输、指数映射及其逆)的解析表达式,以实现可处理的密度评估和梯度计算。
- 将该分布应用于构建 Hyperbolic VAE,先验 p(z) = G(mu0, I),后验 q(z|x) = G(mu, Sigma)。
- 通过在双曲空间中用 G(mu, Sigma) 替换欧几里得高斯嵌入,演示概率词嵌入的方法。
实验结果
研究问题
- RQ1是否可以在双曲空间上一致地定义具有解析密度和可微性的高斯样分布,以用于基于梯度的学习?
- RQ2如何在不进行拒绝采样的情况下,在双曲空间高效地进行采样和密度评估?
- RQ3在标准基准(如 MNIST、WordNet、Atari 轨迹)上衡量时,双曲概率模型对分层数据有哪些好处?
- RQ4在低维潜在空间中,双曲概率模型(Hyperbolic VAE、概率词嵌入)是否优于欧几里得对应模型?
主要发现
- A calss of pseudo-hyperbolic Gaussian distributions on hyperbolic space is shown to admit analytic density evaluation and differentiability with respect to parameters.
- The sampling procedure uses a tangent-space Gaussian, parallel transport, and the exponential map, enabling gradient-based learning without rejection sampling.
- Hyperbolic VAE with the proposed prior and posterior can achieve competitive or better log-likelihoods than Vanilla VAE, especially at lower latent dimensions on MNIST.
- Probabilistic word embeddings using hyperbolic geometry improve reconstruction metrics over Euclidean Gaussian embeddings at several latent dimensions on WordNet noun hierarchy.
- Applications to Atari 2600 Breakout show that Hyperbolic VAE latent representations correlate more strongly with cumulative rewards than vanilla VAEs.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。