QUICK REVIEW

[논문 리뷰] Matching Bayesian and frequentist coverage probabilities when using an approximate data covariance matrix

Will J. Percival, Oliver Friedrich|arXiv (Cornell University)|2021. 08. 23.

Climate variability and models참고 문헌 27인용 수 78

한 줄 요약

이 논문은 데이터 공분산 행렬이 시뮬레이션에서 추정될 때 사후 분포로부터 유도된 신뢰구간이 빈도주의 커버리지 확률과 일치하도록 보장하는 베이지안 사전분포를 제안한다. 사후 공분산을 빈도주의적 파rameter 추정치의 표집 분포와 일치시킴으로써, 이 방법은 유한한 시뮬레이션 샘플에서 발생하는 편향을 보정하면서도 하트랩 보정과 같은 보조적 요소 없이도 베이지안 신뢰구간을 신뢰구간으로 해석할 수 있도록 한다.

ABSTRACT

Observational astrophysics consists of making inferences about the Universe by comparing data and models. The credible intervals placed on model parameters are often as important as the maximum a posteriori probability values, as the intervals indicate concordance or discordance between models and with measurements from other data. Intermediate statistics (e.g. the power spectrum) are usually measured and inferences made by fitting models to these rather than the raw data, assuming that the likelihood for these statistics has multivariate Gaussian form. The covariance matrix used to calculate the likelihood is often estimated from simulations, such that it is itself a random variable. This is a standard problem in Bayesian statistics, which requires a prior to be placed on the true model parameters and covariance matrix, influencing the joint posterior distribution. As an alternative to the commonly-used Independence-Jeffreys prior, we introduce a prior that leads to a posterior that has approximately frequentist matching coverage. This is achieved by matching the covariance of the posterior to that of the distribution of true values of the parameters around the maximum likelihood values in repeated trials, under certain assumptions. Using this prior, credible intervals derived from a Bayesian analysis can be interpreted approximately as confidence intervals, containing the truth a certain proportion of the time for repeated trials. Linking frequentist and Bayesian approaches that have previously appeared in the astronomical literature, this offers a consistent and conservative approach for credible intervals quoted on model parameters for problems where the covariance matrix is itself an estimate.

연구 동기 및 목표

데이터 공분산 행렬이 유한한 수의 시뮬레이션에서 추정될 때 베이지안 신뢰구간과 빈도주의 신뢰구간 사이의 불일치를 해결하기 위해.
반복 시험에서 반복적으로 정의된 신뢰구간의 커버리지 확률이 약간 정확한 빈도주의 커버리지 확률을 가지도록 보장하는 사전분포를 개발하기 위해.
공분산 행렬이 랜덤 변수가 되는 천체역학적 및 천체물리학적 매개변수 추정에서 일관되고 보수적인 불확실성 정량화 방법을 제공하기 위해.
표본 공분산 행렬을 사용함으로써 발생하는 매개변수 오차 추정치의 편향을 보정하기 위해 하트랩 보정과 같은 보조적 요소에 의존하지 않기 위해.

제안 방법

반복 표본 추출 하에서 최대우도 추정치의 빈도주의 표집 분포와 사후 공분산이 일치하도록 하는 진짜 공분산 행렬에 대한 사전분포를 유도한다.
매개변수 공분산 수준에서 일치를 이루기 위해, 진짜 공분산 행렬의 행렬식에 대한 거듭제곱 법칙 사전분포를 사용한다. 특히 |Σ|^{-(n_s + n_d + 1)/2} 형태를 사용한다.
이 사전분포가 다변량 t-분포 사후분포를 이끌어내며, 이는 가우시안 근사보다 꼬리 행동을 더 잘 포착함을 보여준다.
사후분포에 하트랩 요소를 포함시키는 것은 베이지안 관점에서 잘못된 것으로, 이는 이중 편향 보정을 유도하기 때문이다.
매개변수 수가 데이터 차원과 같을 때(posterior covariance matches the expected frequentist sampling covariance when n_θ = n_d).
이론적 유도와 몬테카를로 시뮬레이션을 통해 방법을 검증하였으며, 커버리지 확률이 명시된 수준과 일치하는 것으로 나타났다.

실험 결과

연구 질문

RQ1데이터 공분산 행렬이 시뮬레이션에서 추정될 때, 사후 신뢰구간이 약간 정확한 빈도주의 커버리지 확률을 가지도록 보장하는 베이지안 사전분포를 구성할 수 있는가?
RQ2매개변수 추정치의 빈도주의 표집 분포와 사후 공분산이 일치하도록 보장하기 위해 공분산 행렬에 대한 올바른 사전분포 형태는 무엇인가?
RQ3표준 하트랩 보정 요소가 베이지안 사후분포에서 왜 부적절한가? 그리고 공분산 추정 편향을 보정하기 위한 올바른 방법은 무엇인가?
RQ4제안된 사전분포는 제퍼리스 사전분포와 비교해 볼 때 커버리지 및 해석 가능성 측면에서 어떻게 다른가?

주요 결과

제안된 사전분포 |Σ|^{-(n_s + n_d + 1)/2}는 반복 표본 추출 하에서 사후 공분산이 매개변수 추정치의 빈도주의 표집 공분산과 일치함을 보장한다.
n_θ = n_d일 경우, 사후 공분산은 표본 공분산 행렬 S로 줄어들며, 이는 기대값이 Σ이므로 진짜 공분산과 일치하고, 사후분포에서 하트랩 요소가 필요 없어진다.
사후분포에 하트랩 요소를 포함시키면 과도한 보정이 발생하고 오차 추정치가 편향되며, 이는 역공분산과 사후 공분산 양쪽에 보정을 적용하기 때문이다.
결과적으로 사후분포는 다변량 t-분포가 되며, 이는 가우시안 근사보다 무거운 꼬리 행동을 더 잘 포착하여 데이터 긴장에 대한 강건성을 향상시킨다.
이 방법은 유한한 시뮬레이션 샘플이 있을 때조차도 베이지안 사후 신뢰구간을 약간 정확한 빈도주의 신뢰구간으로 해석할 수 있도록 한다.
이 방법은 천체역학적 및 천체물리학적 매개변수 추정에서 보조적 보정에 대한 일관되고 보수적이며 이론적으로 탄탄한 대안을 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.