QUICK REVIEW

[论文解读] TensorFlow Distributions

Joshua V. Dillon, Ian Langmore|arXiv (Cornell University)|Nov 28, 2017

Gaussian Processes and Bayesian Inference参考文献 41被引用 240

一句话总结

TensorFlow Distributions 提供两个核心抽象—Distributions 和 Bijectors—for fast, differentiable probabilistic programming in TensorFlow, enabling 模块化构建复杂的高维分布和变换并推动诸如 Edward 之类后端的应用。

ABSTRACT

The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community.

研究动机与目标

在 TensorFlow 生态系统内实现端到端的可微分 probabilistic programming。
为大量分布提供快速、数值稳定的采样、对数密度和统计量。
支持批处理、自动微分，以及加速器（GPU/TPU）的兼容性。
提供可组合的变换（Bijectors），以高效构建复杂分布。
与 Edward 和 TensorFlow Estimator 等更高层工具集成，以实现可扩展的研究与生产应用。

提出的方法

引入两个抽象：Distribution（60 余种分布，具备快速采样和 log_prob）和 Bijector（22 种可组合、可微分的变换）。
定义形状语义（sample、batch、event），以实现向量化操作和广播。
通过设备特定的 C++ 内核实现采样，在可能的情况下通过重参数化实现端到端可微。
提供更高阶的分布（对分布的函数）和分布函数量（如 entropy、KL）。
利用 TransformedDistribution 和 Chain/Inver t bijectors 进行模块化组合和高效的密度计算。
纳入对 transformed samples 和 log-determinants 的缓存，以提高重要性采样和变分推断的速度。

实验结果

研究问题

RQ1如何设计一个快速、可微且可扩展的概率库，用于深度概率编程？
RQ2如何将分布和变换组合起来，以表达丰富的高维概率模型？
RQ3哪些抽象（Distributions 和 Bijectors）能够实现对复杂模型的模块化、可重用构建，同时保持数值稳定性？
RQ4这样的库如何与更广泛的 TensorFlow 生态系统和加速器硬件集成？
RQ5在深度学习场景中，高阶分布和分布函数量的实际优势与局限性是什么？

主要发现

该库提供大约 60 种分布，具备快速采样和对数密度计算 plus 22 种可组合的 bijectors。
Distributions 与 Bijectors 使复杂模型（如 VAE、自回归流、基于 PixelCNN 的架构）的端到端可微分的模块化构建成为可能。
形状语义（sample、batch、event）实现了在大型张量上的惯用向量化与广播。
采样通过设备特定内核实现，并支持重参数化以实现对随机节点的有效反向传播。
Bijectors 自动缓存变换和对数行列式，提高了基于采样的推断和变分方法的效率。
TensorFlow Distributions 与 TensorFlow 的组件（层、数据管道、 serving、可视化）集成，并作为 Edward 的后端。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。