QUICK REVIEW

[論文レビュー] Bayes Shrinkage at GWAS scale: Convergence and Approximation Theory of a Scalable MCMC Algorithm for the Horseshoe Prior

James E. Johndrow, Paulo Orenstein|arXiv (Cornell University)|May 2, 2017

Statistical Methods and Inference参考文献 40被引用数 23

ひとこと要約

本稿は、高次元ベイズ回帰におけるホースシュー・プライアのためのスケーラブルなMCMCアルゴリズムを提案する。ブロック更新と行列近似を活用することで、幾何的定常性を達成し、数個のオーダーの高速化を実現する。この手法により、GWASスケールの問題（N=2,267, p=98,385）においても正確な事後分布推定が可能となり、従来手法と比較して収束性が向上し、平均二乗誤差が低く抑えられ、信頼区間のカバレッジも良好である。

ABSTRACT

The horseshoe prior is frequently employed in Bayesian analysis of high-dimensional models, and has been shown to achieve minimax optimal risk properties when the truth is sparse. While optimization-based algorithms for the extremely popular Lasso and elastic net procedures can scale to dimension in the hundreds of thousands, algorithms for the horseshoe that use Markov chain Monte Carlo (MCMC) for computation are limited to problems an order of magnitude smaller. This is due to high computational cost per step and growth of the variance of time-averaging estimators as a function of dimension. We propose two new MCMC algorithms for computation in these models that have improved performance compared to existing alternatives. One of the algorithms also approximates an expensive matrix product to give orders of magnitude speedup in high-dimensional applications. We prove that the exact algorithm is geometrically ergodic, and give guarantees for the accuracy of the approximate algorithm using perturbation theory. Versions of the approximation algorithm that gradually decrease the approximation error as the chain extends are shown to be exact. The scalability of the algorithm is illustrated in simulations with problem size as large as $N=5,000$ observations and $p=50,000$ predictors, and an application to a genome-wide association study with $N=2,267$ and $p=98,385$. The empirical results also show that the new algorithm yields estimates with lower mean squared error, intervals with better coverage, and elucidates features of the posterior that were often missed by previous algorithms in high dimensions, including bimodality of posterior marginals indicating uncertainty about which covariates belong in the model.

研究の動機と目的

高次元設定、特にp ≫ NであるGWASにおいて、ホースシュー・プライアのためのスケーラブルなMCMCアルゴリズムの欠如に応えること。
ホースシューのMCMCサンプラーにおける高価な行列演算と遅い混合の計算的ボトルネックを克服すること。
提案された正確なアルゴリズムの幾何的定常性を確立し、迅速な収束と有効な漸近的推論を保証すること。
行列積の近似を用いることで計算コストを低減しつつ、理論的精度保証を維持する近似アルゴリズムを開発すること。
高次元シミュレーションおよび実際のGWASデータにおいて、本手法の経験的優位性を示すこと。特に、事後分布の二峰性の検出や、信頼区間カバレッジの向上を含む。

提案手法

混合性の向上と幾何的定常性の実現のため、β、σ²、ξ、ηの同時更新を含むブロックギブスサンプリングを用いる。
幾何的定常性の証明のため、Lyapunov関数を構築し、高次元設定下でもβj²σ⁻²が0および∞から離れるように項を制御する。
η⁻¹が概ねスパースである場合に、高価なWDW′行列積を高速なスパース近似に置き換える近似を導入し、計算コストを削減する。
摂動理論を適用して、近似アルゴリズムと正確アルゴリズムの不変測度の誤差を境界づけ、精度を保証する。
徐々に減少する近似誤差のスキームを提案し、極限において正確であることを示し、漸近的有効性を保持する。
本手法は、合成的な高次元データ（N=5,000, p=50,000）および実際のGWASデータセット（N=2,267, p=98,385）の両方で実装・テストされた。

実験結果

リサーチクエスチョン

RQ1ホースシュー・プライアのためのスケーラブルなMCMCアルゴリズムを設計可能か。特に高次元設定下で幾何的定常性を維持できるか。
RQ2ホースシュー事後分布更新における行列演算の計算コストを、精度を損なわずに低減する方法は何か。
RQ3ホースシュー・プライアの文脈において、近似MCMCアルゴリズムの精度に関する理論的保証はどのようなものか。
RQ4提案されたアルゴリズムは、従来手法と比較して高次元回帰において、事後推定の精度と信頼区間カバレッジを向上させるか。
RQ5本アルゴリズムは、変数選択の不確実性を示す、マージナル事後分布の二峰性といった複雑な事後特徴を検出できるか。

主な発見

正確なMCMCアルゴリズムは幾何的定常性が証明され、迅速な収束と時間平均推定量の中心極限定理の有効性を保証する。
近似アルゴリズムは、WDW′行列積の近似により、特にη⁻¹がスパースである場合に、数個のオーダーの高速化を達成する。
摂動理論を用いて、近似アルゴリズムの不変測度が正確アルゴリズムのものに収束することが示され、誤差境界が提供された。
経験的結果では、特に高次元設定下で、従来手法と比較して平均二乗誤差が低く抑えられ、信頼区間カバレッジが向上した。
本手法は、変数選択の不確実性を示す二峰性を持つマージナル事後分布を明確に特定でき、従来のアルゴリズムでは見逃されがちな特徴を捉えた。
本手法はGWASスケールの問題にスケーラブルであり、p=98,385の予測子とN=2,267の観測値を持つデータを正確に分析でき、精度と計算効率の両面で従来手法を上回った。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。