QUICK REVIEW

[論文レビュー] On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

Changyou Chen, Nan Ding|arXiv (Cornell University)|Oct 21, 2016

Markov Chains and Monte Carlo Methods参考文献 25被引用数 124

ひとこと要約

本論文は高次積分器を用いる SG-MCMC 法の弱収束理論を展開し、対称的な2次分割積分器が収束性を改善することを示す（例として、SGHMC は Euler の場合と比較して MSE 収束率が L^{-4/5}、L^{-2/3}）。

ABSTRACT

Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators. For example, with the proposed efficient 2nd-order symmetric splitting integrator, the {\em mean square error} (MSE) of the posterior average for the SGHMC achieves an optimal convergence rate of $L^{-4/5}$ at $L$ iterations, compared to $L^{-2/3}$ for the SGHMC and SGLD with 1st-order Euler integrators. Furthermore, convergence results of decreasing-step-size SG-MCMCs are also developed, with the same convergence rates as their fixed-step-size counterparts for a specific decreasing sequence. Experiments on both synthetic and real datasets verify our theory, and show advantages of the proposed method in two large-scale real applications.

研究の動機と目的

高次積分器を用いた一般的な SG-MCMC の弱収束理論を展開する。
固定ステップ長および減少ステップ長の下で、K-th 次の局所積分器を用いた有限時間のバイアスと MSE を特徴づける。
SG-MCMC の数値的に効率的な2次対称分割積分器を導入する。
確率勾配ノイズが収束と不変分布に与える影響を分析する。

提案手法

SG-MCMC を Itô 拡散と生成子 L を用いてモデル化し、滑らか統計量の期待値の弱収束を研究する。
ポアソン方程式を用いて事後平均と解 psi を関連付け、バイアス/MSE の境界を導出する。
P_h が e^{hL} を近似する K-th 次の局所積分器を導入し、tilde{L}_l を用いて確率勾配設定に拡張する。
境界を導出: bias = O(1/(Lh) + sum_l E||E Delta V_l|| / L + h^K) および MSE = O( (1/L) sum_l E||Delta V_l||^2 / L + 1/(Lh) + h^{2K} )。
SGHMC のための対称分割積分器（ABOBA）を提案・解析し、それが2次局所積分器であることを証明する。

実験結果

リサーチクエスチョン

RQ1K が数値積分器の次数の SG-MCMC アルゴリズムの有限時間バイアスと MSE にどのように影響するか？
RQ2高次積分器を用いた固定ステップ長の SG-MCMC の収束速度はどの程度で、1次 Euler スキームと比較してどうか？
RQ3確率勾配ノイズとステップサイズスケジュール（固定 vs 減少）が漸近的な不変分布と収束保証にどう影響するか？
RQ4実データで高次積分器（例：2次対称分割）を用いた実用性は、SGHMC/SGLD の大規模ベイズ学習の性能を改善できるか？

主な発見

K-th 次の積分器では、イテレーション L におけるバイアスは O(1/(Lh) + sum_l E||E Delta V_l||/L + h^K) である。
イテレーション L における MSE は O( (1/L) sum_l E||Delta V_l||^2 / L + 1/(Lh) + h^{2K} ) である。
2次対称分割積分器（K=2）を用いた SGHMC は、Euler ベースの L^{-1/2} バイアスおよび L^{-2/3} の SGLD/SGHMC の MSE に対して、最適バイアスが L^{-2/3}、MSE が L^{-4/5} というより速い収束率を達成する（h が L^{-1/5} に比例する場合）。
SG-MCMC の不変分布は真の事後分布へ収束し、距離 d( ilde{ ho}_h, ho) = O(h^K)（K-th order integrator）である。
減少ステップサイズの SG-MCMC は一貫性があり、h_l ~ l^{-α} の場合、最適な α は固定ステップの結果と一致する（バイアスは α=1/(K+1)、MSE は α=1/(2K+1)）。
合成データおよび大規模データ（LDA, SBN/MNIST）での実験は、対分割ベースの SGHMC（SGHMC-S）が Euler ベースの方法より優れており、大きなステップサイズで見られる不安定性を回避できることを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。