QUICK REVIEW

[论文解读] On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

Changyou Chen, Nan Ding|arXiv (Cornell University)|Oct 21, 2016

Markov Chains and Monte Carlo Methods参考文献 25被引用 124

一句话总结

本文为高阶积分方法的 SG-MCMC 提出弱收敛理论，并证明对称分裂的二阶积分器可提升收敛性（例如，SGHMC 在使用欧拉方法时的 MSE 率为 L^{-2/3}，而使用二阶对称分裂积分器时达到 L^{-4/5}）。

ABSTRACT

Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators. For example, with the proposed efficient 2nd-order symmetric splitting integrator, the {\em mean square error} (MSE) of the posterior average for the SGHMC achieves an optimal convergence rate of $L^{-4/5}$ at $L$ iterations, compared to $L^{-2/3}$ for the SGHMC and SGLD with 1st-order Euler integrators. Furthermore, convergence results of decreasing-step-size SG-MCMCs are also developed, with the same convergence rates as their fixed-step-size counterparts for a specific decreasing sequence. Experiments on both synthetic and real datasets verify our theory, and show advantages of the proposed method in two large-scale real applications.

研究动机与目标

为具备高阶积分器的通用 SG-MCMC 建立弱收敛理论。
在固定步长和逐步减小步长下，表征 K 阶积分器的有限时间偏差和 MSE。
引入一种数值高效的 SG-MCMC 二阶对称分裂积分器。
分析随机梯度噪声对收敛性和不变分布的影响。

提出的方法

将 SG-MCMC 模型化为生成器为 L 的 Itô 扩散，以研究光滑统计量期望的弱收敛性。
使用泊松方程将后验平均与解 psi 联系起来，并推导偏差/ MSE 边界。
引入 K 阶局部积分器，其 P_h 近似 e^{hL}，并扩展到带 tilde{L}_l 的随机梯度设定。
推导界限：bias = O(1/(Lh) + sum_l E||E Delta V_l|| / L + h^K) 和 MSE = O( (1/L) sum_l E||Delta V_l||^2 / L + 1/(Lh) + h^{2K} ).
提出并分析用于 SGHMC 的二阶对称分裂积分器 (ABOBA)，并证明它是一个二阶局部积分器。

实验结果

研究问题

RQ1数值积分器的阶数 K 如何影响 SG-MCMC 算法的有限时间偏差和 MSE？
RQ2对于固定步长的高阶积分 SG-MCMC，其收敛速率是多少，与一阶 Euler 方案相比如何？
RQ3随机梯度噪声与步长安排（固定/递减）如何影响渐近的不变分布及收敛性保证？
RQ4在真实数据中，实用的高阶积分器（如二阶对称分割）是否可以改善大规模贝叶斯学习（SGHMC/SGLD）的性能？

主要发现

对于 K 阶积分器，迭代到 L 时的 bias 为 O(1/(Lh) + sum_l E||E Delta V_l||/L + h^K)。
迭代到 L 的 MSE 为 O( (1/L) sum_l E||Delta V_l||^2 / L + 1/(Lh) + h^{2K} ).
采用二阶对称分裂积分器（K=2），SGHMC 实现了更快的最优偏差率 L^{-2/3} 和 MSE 率 L^{-4/5}（当 h 与 L^{-1/5 成比例时），相比基于 Euler 的 L^{-1/2} 偏差和 L^{-2/3} MSE 的 SGLD/SGHMC。
SG-MCMC 的不变测度收敛到真实后验测度，距离 d(6rho_h, rho) = O(h^K)（K 阶积分器）。
递减步长的 SG-MCMC 是一致的；当 h_l ~ l^{-α} 时，最优 α 值与固定步长结果一致（α=1/(K+1)，对偏差；α=1/(2K+1)，对 MSE）。
在合成数据和大规模数据（LDA，SBN/MNIST）上的实验表明，基于分裂的 SGHMC（SGHMC-S）优于基于 Euler 的方法，并且避免了在较大步长下出现的不稳定性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。