QUICK REVIEW

[論文レビュー] (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics

Sebastian J. Vollmer, Konstantinos C. Zygalakis|arXiv (Cornell University)|Jan 2, 2015

Markov Chains and Monte Carlo Methods参考文献 1被引用数 29

ひとこと要約

この論文は、固定ステップサイズを用いた確率的勾配ラングヴィン動力学（SGLD）の非漸近的バイアスと分散を分析し、明示的な漸近的バイアス展開を導出し、確率的勾配の分散に起因する一次バイアスを除去する修正SGLD（mSGLD）を提案する。有限時間におけるバイアス、分散、平均二乗誤差（MSE）のバウンディングを確立し、mSGLDが高精度な状況下で標準SGLDを上回り、減少ステップサイズSGLDと同等のMSEの減少率を達成することを示す。ガウス分布のトイモデルを用いた解析的検証も実施している。

ABSTRACT

Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally infeasible. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem in three ways: it generates proposed moves using only a subset of the data, it skips the Metropolis-Hastings accept-reject step, and it uses sequences of decreasing step sizes. In \cite{TehThierryVollmerSGLD2014}, we provided the mathematical foundations for the decreasing step size SGLD, including consistency and a central limit theorem. However, in practice the SGLD is run for a relatively small number of iterations, and its step size is not decreased to zero. The present article investigates the behaviour of the SGLD with fixed step size. In particular we characterise the asymptotic bias explicitly, along with its dependence on the step size and the variance of the stochastic gradient. On that basis a modified SGLD which removes the asymptotic bias due to the variance of the stochastic gradients up to first order in the step size is derived. Moreover, we are able to obtain bounds on the finite-time bias, variance and mean squared error (MSE). The theory is illustrated with a Gaussian toy model for which the bias and the MSE for the estimation of moments can be obtained explicitly. For this toy model we study the gain of the SGLD over the standard Euler method in the limit of large data sets.

研究の動機と目的

理論的基盤が減少ステップに依存する一方で、実務では一般的に用いられる固定ステップサイズSGLDの非漸近的挙動を理解すること。
ステップサイズと確率的勾配推定器の分散に依存するSGLDの漸近的バイアスを明示的に特定すること。
ステップサイズの一次までに、勾配の分散に起因する一次バイアスを除去する修正SGLD（mSGLD）を導出すること。
SGLDおよびmSGLDのバイアス、分散、平均二乗誤差（MSE）に対する有限時間上界を確立すること。
ガウス分布のトイモデルにおける解析的計算とロジスティック回帰における数値シミュレーションを通じて、理論的知見を検証すること。

提案手法

ステップサイズの一次までに、SGLDのバイアスの漸近的展開を導出し、確率的勾配推定器の分散に依存する関係を特定する。
制御変数アプローチを用いて勾配推定器を補正することで、一次バイアスを補正する修正SGLD（mSGLD）を提案する。
カップリングおよびマルティンググール技術を用いて、SGLDおよびmSGLDのバイアス、分散、MSEに対する有限時間上界を確立する。
1次元のガウス分布ロケーションモデルを分析し、標本平均およびそのモーメントの正確な式を計算することで、バイアスおよびMSEの解析的検証を可能にする。
固定ステップサイズを用いたベイジアンロジスティック回帰における数値シミュレーションを実施し、異なるバッチサイズおよび反復回数におけるSGLDとmSGLDのMSEを比較する。

実験結果

リサーチクエスチョン

RQ1固定ステップサイズを用いたSGLDにおける漸近的バイアスの明示的形は何か？また、ステップサイズと勾配の分散にどのように依存するか？
RQ2確率的勾配の分散に起因する一次バイアスを除去できる修正SGLDを構築できるか？
RQ3SGLDおよびmSGLDのバイアス、分散、平均二乗誤差（MSE）に対する有限時間上界は何か？
RQ4特に高精度な状況下や小さなデータバッチにおいて、mSGLDの性能は標準SGLDと比べてMSEの観点でどのように異なるか？
RQ5大規模データの極限において、mSGLDは減少ステップサイズSGLDと同等のMSEの減少率を達成するか？

主な発見

SGLDの漸近的バイアスは、確率的勾配推定器の分散に比例し、その係数はステップサイズに依存する。
提案されたmSGLDは、ステップサイズの一次までに勾配分散に起因する一次バイアスを除去し、高精度な状況下での精度向上を実現する。
バイアス、分散、MSEに対する有限時間上界が導出され、MSEの減少率が減少ステップサイズSGLDの最適な $ K^{-1/3} $ の減少率と一致することが示された。
ガウス分布のトイモデルにおいて、時間平均推定器のバイアスおよびMSEの正確な式が導出され、理論的バイアス展開およびMSEの減少が確認された。
ロジスティック回帰における数値結果から、バッチサイズが中程度（$ n=150 $）の場合はmSGLDがSGLDをMSEの観点で上回るが、非常に小さなバッチ（$ n=10, 50 $）では逆に性能が劣ることから、バイアスと分散のトレードオフが生じることが示された。
大規模データの極限において、SGLDは2次モーメントの推定における計算複雑性をMSEが消える速度で1乗分低減できることを示唆しており、大規模ベイズ推論における計算的利点がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。