QUICK REVIEW

[論文レビュー] Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison

Tuomas Sivula, Måns Magnusson|arXiv (Cornell University)|Aug 24, 2020

Statistical Methods and Bayesian Inference参考文献 24被引用数 100

ひとこと要約

この論文は、2つのモデルを比較するために用いられるベイズ LOO-CV の不確実性を分析し、標準誤差推定が有限サンプルで信頼できない場合があり、特にモデルが類似している、ミス指定されている、またはデータが不足している場合に顕著であることを示し、正規ブートストラップとベイズブートストラップのアプローチを提案し、実務的な指針を提供する。

ABSTRACT

It is useful to estimate the expected predictive performance of models planned to be used for prediction. We focus on leave-one-out cross-validation (LOO-CV), which has become a popular method for estimating predictive performance of Bayesian models. Given two models, we are interested in comparing the predictive performances and associated uncertainty, which can also be used to compute the probability of one model having better predictive performance than the other model. We study the properties of the Bayesian LOO-CV estimator and the related uncertainty quantification for the predictive performance difference, and analyse when a normal approximation of this uncertainty is well calibrated and whether taking into account higher moments could improve the approximation. We provide new results of the properties both theoretically in the linear regression case and empirically for hierarchical linear, latent linear, and spline models and discuss the challenges. We show that problematic cases include: comparing models with similar predictions, misspecified models, and small data. In these cases, there is a weak connection between the distributions of the LOO-CV estimator and its error. We show that that the problematic skewness of the error distribution for the difference, which occurs when the models make similar predictions, does not fade away when the data size grows to infinity in certain situations. Based on the results, we also provide some practical recommendations for the users of Bayesian LOO-CV for comparing predictive performance of models.

研究の動機と目的

ベイズ LOO-CV を用いたモデル比較で elpd 差の不確実性がどのように振る舞うかを評価する。
標準的な不確実性推定が信頼できない状況を特定する（例: 予測が類似、ミス指定、小さいデータ）。
正規線形回帰および他のモデルにおける LOO-CV の不確実性の理論的・経験的特性を分析する。
ベイズ LOO-CV を使用する実務家への実務的推奨を提供する。

提案手法

モデル比較のための elpd とその LOO-CV 推定量を定式化する。
誤差 err_LOO とその分布 p(err_LOO) を介して差 elpd(Ma, Mb|y) の不確実性を分析する。
正規近似とベイズブートストラップ（Dirichlet）という2つの近似アプローチを比較して不確実性の分布を評価する。
正規線形回帰の解析結果を導出し、複数のモデルで実験により検証する。
PIT を用いて近似された不確実性の分布がオラクル分布に対してどの程度校正されているかを評価する。
漸近挙動と有限サンプルの問題点（歪みやミス指定を含む）を議論する。

実験結果

リサーチクエスチョン

RQ1ベイズ LOO-CV を用いたモデル比較における予測性能差の標準的不確実性推定はどれくらい信頼できますか？
RQ2正規近似やベイズブートストラップの近似が失敗する、または較正が悪化するのはどのような状況ですか？
RQ3歪み、ミス指定、サンプルサイズの小ささは LOO-CV の不確実性にどのような影響を与えますか？
RQ4正規線形回帰を超えて階層モデル、ポアソン GLM、スプラインなどの他のモデルにも結果は一般化しますか？

主な発見

有限サンプルで LOO-CV 差の不確実性を過小評価・過大評価することがあり、特にモデルの予測が類似、ミス指定がある、データが限られている場合に顕著である。
LOO-CV 推定量誤差の分布は高度に歪む可能性があり、正規近似が信頼できない状況がある。
ミス指定と外れ値は LOO-CV 推定量にバイアスを与え分散を膨らませ、モデル比較の結論に影響を与える。
データサイズが大きくなっても、特定の歪みパターンは残り、どちらのモデルが良いかの推定を難しくする。
ベイズブートストラップは実務上、エルPD差の不確実性に対して正規近似を普遍的に上回るとは限らない。
正規線形回帰の結果は他のモデルにも質的に拡張され、ベイズ K-fold CV でも同様の挙動が見られる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。