Skip to main content
QUICK REVIEW

[论文解读] Uncertainty-aware and Data-efficient Cosmological Emulation using Gaussian Processes and PCA

Sven Günther|arXiv (Cornell University)|Jul 3, 2023
Gaussian Processes and Bayesian Inference被引用 11
一句话总结

本论文提出一个结合高斯过程和主成分分析的带不确定性感知的仿真器,并结合在线主动学习以加速贝叶斯宇宙学推断,在理论求值次数和计算成本方面实现大幅降低,同时保持后验的准确性。

ABSTRACT

Bayesian parameter inference is one of the key elements for model selection in cosmological research. However, the available inference tools require a large number of calls to simulation codes which can lead to high and sometimes even infeasible computational costs. In this work we propose a new way of emulating simulation codes for Bayesian parameter inference. In particular, this novel approach emphasizes the uncertainty-awareness of the emulator, which allows to state the emulation accuracy and ensures reliable performance. With a focus on data efficiency, we implement an active learning algorithm based on a combination of Gaussian Processes and Principal Component Analysis. We find that for an MCMC analysis of Planck and BAO data on the $Λ$CDM model (6 model and 21 nuisance parameters) we can reduce the number of simulation calls by a factor of $\sim$500 and save about $96\%$ of the computational costs.

研究动机与目标

  • 动机:由于成本高的 Einstein-Boltzmann 求解器,需要更快的宇宙学参数推断。
  • 提出一个带不确定性感知的仿真器,将高斯过程与PCA结合用于数据压缩。
  • 实现一个在线主动学习策略,在推断期间训练仿真器。
  • 在 Planck+BAO 的 LambdaCDM 测试用例中展示显著的加速和受控偏差。

提出的方法

  • 使用带各向异性的 RBF 核的高斯过程来模拟低维量,使用主成分分析(PCA)将诸如CMB谱等高维观测压缩。
  • 将高维数据转换到PCA空间,以实现对每个分量的独立、快速GP仿真。
  • 从PCA信息损失和GP采样稀疏性量化仿真器的不确定性,并将其传播回数据空间。
  • 采用在线主动学习,在基于似然的不确定性准则下决定何时生成新的理论评估以重新训练仿真器。
  • 将仿真器集成到修改的 cobaya 贝叶斯采样器中,以在减少理论评估的情况下运行MCMC。
Figure 1: Uncertainty qualification of the CMB TT spectrum emulator for a training scenario similar to the one outlined in section III . We compare the performance of the emulator with the full calculation obtained by CLASS . (Top) $D_{\ell}$ spectrum. (Center) Residuals with the uncertainty estimat
Figure 1: Uncertainty qualification of the CMB TT spectrum emulator for a training scenario similar to the one outlined in section III . We compare the performance of the emulator with the full calculation obtained by CLASS . (Top) $D_{\ell}$ spectrum. (Center) Residuals with the uncertainty estimat

实验结果

研究问题

  • RQ1如何使基于GP的仿真在宇宙学推断中具有不确定性感知?
  • RQ2PCA是否能在不影响推断精度的前提下降低宇宙学观测的维度?
  • RQ3在Planck+BAO的 LambdaCDM,结合在线主动学习,可以实现理论评估和计算成本的何种加速?
  • RQ4与完整理论计算相比,仿真器引入的不确定性如何传播到后验估计?

主要发现

  • 仿真器在数据效率方面实现显著加速,并为预测提供不确定性估计。
  • 在Planck与BAO数据的LambdaCDM中,含6个宇宙学参数+21个扰动参数,理论调用量约从500倍缩减到126次,节省约96%的计算成本。
  • 使用仿真器时,在六个宇宙学参数及Planck标定下,后验均值和偏差很小(小于等于5%)。
  • PCA信息损失和GP采样稀疏性的不确定性被传播到数据空间并合成为总的仿真器不确定性。
  • 该方法在较少昂贵理论计算的前提下实现可靠推断,并与全理论后验保持一致。
Figure 2: Predicted posterior estimate for the $\Lambda\mathrm{CDM}$ model tested on BAO and Planck TT,TE and EE data. (Blue) posterior estimate using the emulator. It was trained with 126 calls of the theory code. (Red) Comparative MCMC without the use of the emulator. The contours were generated w
Figure 2: Predicted posterior estimate for the $\Lambda\mathrm{CDM}$ model tested on BAO and Planck TT,TE and EE data. (Blue) posterior estimate using the emulator. It was trained with 126 calls of the theory code. (Red) Comparative MCMC without the use of the emulator. The contours were generated w

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。