QUICK REVIEW

[论文解读] Matching Bayesian and frequentist coverage probabilities when using an approximate data covariance matrix

Will J. Percival, Oliver Friedrich|arXiv (Cornell University)|Aug 23, 2021

Climate variability and models参考文献 27被引用 78

一句话总结

本文提出了一种贝叶斯先验，当从模拟中估计数据协方差矩阵时，可确保后验分布的可信区间与频率学派的覆盖概率相匹配。通过使后验协方差与参数估计的频率学派抽样分布相匹配，该方法使贝叶斯区间可被解释为置信区间，无需采用诸如Hartlap校正之类的临时因子即可纠正有限模拟样本带来的偏差。

ABSTRACT

Observational astrophysics consists of making inferences about the Universe by comparing data and models. The credible intervals placed on model parameters are often as important as the maximum a posteriori probability values, as the intervals indicate concordance or discordance between models and with measurements from other data. Intermediate statistics (e.g. the power spectrum) are usually measured and inferences made by fitting models to these rather than the raw data, assuming that the likelihood for these statistics has multivariate Gaussian form. The covariance matrix used to calculate the likelihood is often estimated from simulations, such that it is itself a random variable. This is a standard problem in Bayesian statistics, which requires a prior to be placed on the true model parameters and covariance matrix, influencing the joint posterior distribution. As an alternative to the commonly-used Independence-Jeffreys prior, we introduce a prior that leads to a posterior that has approximately frequentist matching coverage. This is achieved by matching the covariance of the posterior to that of the distribution of true values of the parameters around the maximum likelihood values in repeated trials, under certain assumptions. Using this prior, credible intervals derived from a Bayesian analysis can be interpreted approximately as confidence intervals, containing the truth a certain proportion of the time for repeated trials. Linking frequentist and Bayesian approaches that have previously appeared in the astronomical literature, this offers a consistent and conservative approach for credible intervals quoted on model parameters for problems where the covariance matrix is itself an estimate.

研究动机与目标

解决当从有限数量的模拟中估计数据协方差矩阵时，贝叶斯可信区间与频率学派置信区间之间的不匹配问题。
开发一种先验，确保在重复试验中，贝叶斯可信区间的频率学派覆盖概率近似正确。
为宇宙学和天体物理学参数推断中协方差矩阵为随机变量的情形，提供一种一致且保守的不确定性量化方法。
纠正因使用样本协方差矩阵而导致的参数误差估计偏差，且不依赖于诸如Hartlap校正之类的临时因子。

提出的方法

推导出对真实协方差矩阵的先验，使其后验协方差与在重复抽样下最大似然估计的频率学派抽样分布相匹配。
对真实协方差矩阵的行列式使用幂律先验，具体形式为 |Σ|^{-(n_s + n_d + 1)/2}，以在参数协方差层面实现匹配。
表明该先验导致后验为多元t分布，其尾部行为比高斯近似更优。
证明在后验中包含Hartlap因子在贝叶斯视角下是不正确的，因为它引入了双重偏差校正。
证明当参数数量等于数据维度（n_θ = n_d）时，后验协方差与频率学派抽样协方差的期望相匹配。
通过理论推导和蒙特卡洛模拟验证该方法，表明覆盖概率与名义水平一致。

实验结果

研究问题

RQ1能否构造一种贝叶斯先验，使得当从模拟中估计协方差矩阵时，可信区间具有近似正确的频率学派覆盖概率？
RQ2协方差矩阵的正确先验形式是什么，才能确保后验协方差与参数估计的频率学派抽样分布相匹配？
RQ3为何标准的Hartlap因子校正在贝叶斯后验中不恰当？正确的协方差估计偏差校正方式是什么？
RQ4与Jeffreys先验等标准先验相比，所提出的先验在覆盖概率和可解释性方面表现如何？

主要发现

所提出的先验 |Σ|^{-(n_s + n_d + 1)/2} 确保了在重复抽样下，后验协方差与参数估计的频率学派抽样协方差相匹配。
当 n_θ = n_d 时，后验协方差简化为样本协方差矩阵 S，其期望为 Σ，与真实协方差一致，从而无需在后验中使用Hartlap因子。
在后验中包含Hartlap因子会导致过度校正和偏差的误差估计，因为它同时对协方差逆矩阵和后验协方差施加了校正。
所得后验为多元t分布，其尾部行为比高斯近似更优，能更好地捕捉重尾特性，提升对数据矛盾的鲁棒性。
该方法使贝叶斯可信区间可被解释为具有近似正确覆盖概率的频率学派置信区间，即使在有限模拟样本下也成立。
该方法为宇宙学和天体物理学参数推断中提供了统一、保守且理论基础坚实的替代方案，可替代临时校正方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。