QUICK REVIEW

[论文解读] Ensemble-Based Experimental Design for Targeting Data Acquisition to Inform Climate Models

Oliver R. A. Dunbar, Michael F. Howland|arXiv (Cornell University)|Jan 12, 2022

Climate variability and models参考文献 101被引用 12

一句话总结

本文提出了一种基于集成的贝叶斯实验设计算法，旨在通过最大化信息增益来减少气候模型中参数不确定性的数据采集。利用校准-模拟-采样（CES）框架，该方法识别出最优的空间和时间位置（尤其在信风交汇带，即ITCZ附近）进行高分辨率模拟或观测，仅通过少量模型评估即可实现高效的不确定性量化。

ABSTRACT

Data required to calibrate uncertain GCM parameterizations are often only available in limited regions or time periods, for example, observational data from field campaigns, or data generated in local high-resolution simulations. This raises the question of where and when to acquire additional data to be maximally informative about parameterizations in a GCM. Here we construct a new ensemble-based parallel algorithm to automatically target data acquisition to regions and times that maximize the uncertainty reduction, or information gain, about GCM parameters. The algorithm uses a Bayesian framework that exploits a quantified distribution of GCM parameters as a measure of uncertainty. This distribution is informed by time-averaged climate statistics restricted to local regions and times. The algorithm is embedded in the recently developed calibrate-emulate-sample (CES) framework, which performs efficient model calibration and uncertainty quantification with only $\mathcal{O}(10^2)$ model evaluations, compared with $\mathcal{O}(10^5)$ evaluations typically needed for traditional approaches to Bayesian calibration. We demonstrate the algorithm with an idealized GCM, with which we generate surrogates of local data. In this perfect-model setting, we calibrate parameters and quantify uncertainties in a quasi-equilibrium convection scheme in the GCM. We consider targeted data that are (i) localized in space for statistically stationary simulations, and (ii) localized in space and time for seasonally varying simulations. In these proof-of-concept applications, the calculated information gain reflects the reduction in parametric uncertainty obtained from Bayesian inference when harnessing a targeted sample of data. The largest information gain typically, but not always, results from regions near the intertropical convergence zone (ITCZ).

研究动机与目标

解决在空间和时间上数据有限时，高效定位数据采集以校准不确定气候模型参数化方案的挑战。
通过识别最具有信息量的区域和时间段来减少一般环流模型（GCMs）中的参数不确定性。
开发一种计算高效的算法，利用现有CES框架的输出，避免额外的GCM评估。
在理想化的GCM设置中展示该方法的有效性，涵盖统计平稳和季节性变化的模拟。
通过提供可扩展、自动化的定位策略，实现对高分辨率模拟和观测计划的实际应用。

提出的方法

将参数校准表述为一个贝叶斯反问题，使用时间平均气候统计量作为数据。
采用基于先验与后验分布之间信息熵损失的效用函数，量化信息增益。
将贝叶斯实验设计整合进校准-模拟-采样（CES）框架中，仅需约O(10²)次GCM评估即可实现高效的不确定性量化。
利用基于CES校准样本训练的高斯过程模拟器，近似模型输出并计算后验分布，而无需重新运行GCM。
通过在整个区域内最大化信息增益效用函数，自动识别最优设计点（空间和时间）。
将该方法应用于一个具有准平衡对流方案的理想化湿性GCM，使用合成局部数据模拟高分辨率模拟或观测。

实验结果

研究问题

RQ1在何处和何时获取额外数据，才能最有效地减少GCM参数化方案的不确定性？
RQ2如何在不增加额外计算成本的前提下，高效地将贝叶斯实验设计集成到现有不确定性量化框架（如CES）中？
RQ3数据在空间和时间上的局部化在最大化参数校准的信息增益方面起到何种作用？
RQ4该算法在不同气候模式下（如统计平稳与季节性变化条件）的表现如何？
RQ5该算法在噪声或稀疏数据条件下，对识别最具信息量的站点（如ITCZ）的准确性如何，其鲁棒性如何？

主要发现

为实现最大信息增益而选择的最优数据采集位置通常位于信风交汇带（ITCZ）附近，尤其是在季节性变化的模拟中。
对于窄空间设计（ℓ=1），在统计平稳情况下，副热带降水最小区被识别为最优位置，尽管ITCZ是主要关注区域。
在季节性情况下，该算法正确地将ITCZ识别为主要目标，且在夏季副热带区域存在次级最大值，与该区域对流的物理重要性一致。
该算法通过利用CES框架，在极少的模型评估次数下实现了显著的不确定性减少，避免了传统MCMC方法通常需要的O(10⁵)次评估。
当数据存在噪声，或平均时间尺度和模板大小减小时，由于抽样变异性及模型误差膨胀，该算法对信息含量的预测准确性下降。
参数的后验分布可访问且可诊断，支持通过将当前后验作为后续设计周期的先验，实现迭代优化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。