QUICK REVIEW

[论文解读] The promises and pitfalls of Stochastic Gradient Langevin Dynamics

Nicolas Brosse, Alain Durmus|arXiv (Cornell University)|Nov 25, 2018

Quantum many-body systems被引用 47

一句话总结

本文分析在恒定步长下的 SGLD，展示其不变分布随着数据量增大可能偏离后验分布，并将其与 SGLDFP、LMC 以及 SGD 在 Wasserstein 距离和矩展开的对比。

ABSTRACT

Stochastic Gradient Langevin Dynamics (SGLD) has emerged as a key MCMC algorithm for Bayesian learning from large scale datasets. While SGLD with decreasing step sizes converges weakly to the posterior distribution, the algorithm is often used with a constant step size in practice and has demonstrated successes in machine learning tasks. The current practice is to set the step size inversely proportional to $N$ where $N$ is the number of training samples. As $N$ becomes large, we show that the SGLD algorithm has an invariant probability measure which significantly departs from the target posterior and behaves like Stochastic Gradient Descent (SGD). This difference is inherently due to the high variance of the stochastic gradients. Several strategies have been suggested to reduce this effect; among them, SGLD Fixed Point (SGLDFP) uses carefully designed control variates to reduce the variance of the stochastic gradients. We show that SGLDFP gives approximate samples from the posterior distribution, with an accuracy comparable to the Langevin Monte Carlo (LMC) algorithm for a computational cost sublinear in the number of data points. We provide a detailed analysis of the Wasserstein distances between LMC, SGLD, SGLDFP and SGD and explicit expressions of the means and covariance matrices of their invariant distributions. Our findings are supported by limited numerical experiments.

研究动机与目标

推动在大数据集上使用 SGLD 以实现可扩展的贝叶斯学习。
表征当 N 增大时恒定步长的 SGLD 相对于真实后验的行为。
使用 Wasserstein 距离和矩展开，将 SGLD 与 SGLDFP 等变体，以及 Langevin Monte Carlo (LMC) 和 SGD 进行比较。
就何时 SGLD 能近似后验、何时不能，提供实际指导。

提出的方法

将目标后验建模为 Langevin 漂移的不变测度。
使用欧拉离散化以带小批量梯度估计器定义 LMC、SGLD 和 SGLDFP。
对 U 和 U_i 做出假设（梯度 Lipschitz、强凸性、凸性），以推导 Wasserstein 距离界。
推导 LMC、SGLDFP、SGLD、SGD 的边缘分布到各自不变测度的 W2 距离上界。
通过扰动分析（H、G、K 矩阵）给出不变分布的均值和协方差的显式表达。
用模拟数据和类似 Covertype 的数据集进行有限的数值实验以支持理论发现。

实验结果

研究问题

RQ1随着 N 增大，LMC、SGLDFP、SGLD、SGD 的不变分布与目标后验 π 的接近程度如何？
RQ2在恒定步长下，这些算法的边缘分布之间的 Wasserstein 距离如何演变？
RQ3控制变差（SGLDFP）是否能以低于线性的数据量成本回收近似后验样本？
RQ4相对于 π，不变分布的均值和协方差之差是多少，以及如何随 N 和 γ 进行放大/缩放？
RQ5在什么条件下 SGLD 更像 SGD 而不是后验分布？

主要发现

LMC 和 SGLDFP 的不变测度随着 N 增大而接近后验 π，对于 SGLDFP，采样成本对 N 的增长为子线性。
SGLD 的不变测度随着 N 增大仍然与 π 相距较远，并且与 SGD 相似，这是由于子采样导致的梯度方差较大。
Wasserstein 边界显示收敛速率和成本权衡，表明 LMC 在 W2 达到 ε 精度的成本大致与 N 成线性，SGLDFP 为 sublinear。
均值/协方差展开揭示 LMC 与 FP 的偏差和协方差为 Θ(1/N)，而 SGLD/SGD 的偏差为 Θ(η)，其中 η = γN，在给定渐近情形下。
理论结果通过对贝叶斯逻辑回归和一个大规模数据集的仿真实验来说明，突出梯度方差行为和测试集性能差异。
分析提出减少 SGLD 偏差的策略（例如调整 γ 和 p 或使用控制变差）以接近后验采样。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。