QUICK REVIEW

[論文レビュー] Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-Dimensional Settings

Minsuk Shin, Anirban Bhattacharya|arXiv (Cornell University)|Jul 25, 2015

Statistical Methods and Inference参考文献 45被引用数 27

ひとこと要約

本稿では、p ≫ n の設定におけるスケーラブルなベイズ変数選択を、非局所的事前分布密度を用いて提案する。超ハイパラメータτがlog pより速く増加する場合、強力なモデル選択の一致性が示される。lasso や SCAD などのペナルティ付き尤度法と比較して、より優れた性能を示し、偽発見率が低く、事後分布の集中が速い。精度-再現率曲線と、効率的な計算のための新規なS5アルゴリズムにより検証された。

ABSTRACT

Bayesian model selection procedures based on nonlocal alternative prior densities are extended to ultrahigh dimensional settings and compared to other variable selection procedures using precision-recall curves. Variable selection procedures included in these comparisons include methods based on $g$-priors, reciprocal lasso, adaptive lasso, scad, and minimax concave penalty criteria. The use of precision-recall curves eliminates the sensitivity of our conclusions to the choice of tuning parameters. We find that Bayesian variable selection procedures based on nonlocal priors are competitive to all other procedures in a range of simulation scenarios, and we subsequently explain this favorable performance through a theoretical examination of their consistency properties. When certain regularity conditions apply, we demonstrate that the nonlocal procedures are consistent for linear models even when the number of covariates $p$ increases sub-exponentially with the sample size $n$. A model selection procedure based on Zellner's $g$-prior is also found to be competitive with penalized likelihood methods in identifying the true model, but the posterior distribution on the model space induced by this method is much more dispersed than the posterior distribution induced on the model space by the nonlocal prior methods. We investigate the asymptotic form of the marginal likelihood based on the nonlocal priors and show that it attains a unique term that cannot be derived from the other Bayesian model selection procedures. We also propose a scalable and efficient algorithm called Simplified Shotgun Stochastic Search with Screening (S5) to explore the enormous model space, and we show that S5 dramatically reduces the computing time without losing the capacity to search the interesting region in the model space. The S5 algorithm is available in an \verb R ~package {\it BayesS5} on exttt{CRAN}.

研究の動機と目的

pが予測変数の数でnが標本サイズである、p ≫ n の設定における非局所的事前分布の理論的および実証的理解の不足に対処すること。
非局所的事前分布とペナルティ付き尤度法（例：lasso, SCAD, アダプティブlasso, MCP, rlasso）およびg-事前分布の間で、モデル選択の正確性と偽発見率の制御の観点から性能を比較すること。
非局所的事前分布がハイパラメトリックな高次元設定で強力なモデル選択の一貫性を達成するための理論的条件を確立すること。
高次元モデル空間における事後分布探索の加速を図る、効率的かつスケーラブルなモデル探索アルゴリズム（S5）の開発および実装すること。
ハイパラメータ選択（例：τ）に関する実用的指針を提供するとともに、計算コストと不確実性の定量化の観点から、ベイズ的手法とペナルティ付き尤度法を比較すること。

提案手法

回帰係数に、調整パrameter τが0から離れる方向の事前分布の集中を制御する非局所的事前分布密度（積の指数モーメント（peMoM）および積の逆モーメント（piMoM））を用いる。
非局所的事前分布の正規化定数が計算不能であるため、各モデルの周辺尤度を効率的に計算するため、ラプラス近似を用いる。これにより、事後モデル確率の計算が可能になる。
S5アルゴリズムを導入する。これは、S5（Stochastic Search with Screening）に温度制御とスクリーニングを組み合わせた、モデル空間の探索を加速する効率的な探索手順である。
精度-再現率曲線を主な評価指標として用い、超ハイパラメトリックな設定における真の信号のスパarsityを考慮し、ROC曲線よりも好ましいと判断した。
非局所的事前分布が強力なモデル選択の一貫性を達成するための漸近的条件を導出し、τがlog pより速く増加する場合、pがnの指数的でない速度で増加しても一貫性が保証されることを示した。
rlassoと非局所的事前分布の間の関係を確立し、rlassoのペナルティ関数が非局所的事前分布の負の対数カーネルと等価であることを示した。

実験結果

リサーチクエスチョン

RQ1非局所的事前分布は、p ≫ n の設定で強力なモデル選択の一貫性を達成できるか？その条件はハイパラメータτにどのように依存するか？
RQ2非局所的事前分布に基づくベイズ変数選択手順は、lasso や SCAD、rlasso などのペナルティ付き尤度法と比較して、偽発見率と検出力の観点でどのように異なるか？
RQ3予測変数の数pが増加するに従い、最適なハイパラメータ（非局所的事前分布のτ、g-事前分布のg）の挙動はいかなるか？
RQ4精度を損なわず、高次元設定にスケーラブルな効率的かつスケーラブルなアルゴリズム（S5）を開発できるか？
RQ5高次元設定において、非局所的事前分布を用いたベイズ手法は、ペナルティ付き尤度法に比べて計算的および推論的利点をどのように有するか？

主な発見

非局所的事前分布は、p ≫ n の設定で、ハイパラメータτがlog pより速く増加する場合、強力なモデル選択の一貫性を達成する。理論的および実証的結果が一致する。
非局所的事前分布の最適なハイパラメータτは、pの増加に伴い非常にゆっくりと増加する（pが1000から20,000に増加する間、1.97から3.60に増加）。これに対してg-事前分布の最適なgは急激に増加する（7.83×10⁸から4.29×10¹³に増加）。
非局所的事前分布を用いたベイズ手順は、精度-再現率曲線により、偽発見率が低く、同等の検出力を維持する点で、ペナルティ付き尤度法を上回る性能を示した。
非局所的事前分布に基づく事後分布は、g-事前分布に基づくものよりも最大事後確率（MAP）モデルの周囲にきつく集中しており、より速い事後分布の集中を示している。
S5アルゴリズムは、モデル探索を著しく加速し、SSSと同一のMAPモデルを発見するが、はるかに短い時間で実行可能であり、高次元設定にスケーラブルである。
ベイズ枠組みは、モデル平均化による事後モデル確率と不確実性の定量化を可能とし、ペナルティ付き尤度法の点推定に比べて優位性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。