QUICK REVIEW

[论文解读] SEEDS: Emulation of Weather Forecast Ensembles with Diffusion Models

Lizao Li, Robert W. Carver|arXiv (Cornell University)|Jun 24, 2023

Meteorological Phenomena and Simulations被引用 9

一句话总结

SEEDS 使用扩散模型在少量初始预测条件下生成大规模、天气般的集合，获得可比或更好且计算量大幅下降的预测能力。

ABSTRACT

Uncertainty quantification is crucial to decision-making. A prominent example is probabilistic forecasting in numerical weather prediction. The dominant approach to representing uncertainty in weather forecasting is to generate an ensemble of forecasts. This is done by running many physics-based simulations under different conditions, which is a computationally costly process. We propose to amortize the computational cost by emulating these forecasts with deep generative diffusion models learned from historical data. The learned models are highly scalable with respect to high-performance computing accelerators and can sample hundreds to tens of thousands of realistic weather forecasts at low cost. When designed to emulate operational ensemble forecasts, the generated ones are similar to physics-based ensembles in important statistical properties and predictive skill. When designed to correct biases present in the operational forecasting system, the generated ensembles show improved probabilistic forecast metrics. They are more reliable and forecast probabilities of extreme weather events more accurately. While this work demonstrates the utility of the methodology by focusing on weather forecasting, the generative artificial intelligence methodology can be extended for uncertainty quantification in climate modeling, where we believe the generation of very large ensembles of climate projections will play an increasingly important role in climate risk assessment.

研究动机与目标

通过大型集合量化数值天气预报中的预报不确定性。
开发一种可扩展的生成方法，从有限的种子中模拟基于物理的集合分布。
使后处理通过与替代数据源混合来去偏集合。
证明生成的集合在关键技能指标上与基于物理的集合相匹配或超越。
评估生成集合的可靠性和极端事件的表示。

提出的方法

以 K 个种子预测为条件训练扩散基生成模型，产生 N>K 的天气状态样本。
将大气数据表示为立方球网格上的标准化异常，并使用具有轴向注意力的 ViT 启发式得分网络。
两项学习任务：生成集合仿真（从种子仿真 p(v)）和生成后处理（近似 α p(v) + (1−α) p′(v)）。
使用 20 年的 GEFS 再分析预报和 ERA5 再分析数据进行训练和评估。
使用评级直方图、RMSE、ACC、CRPS 以及极端事件（±2σ）技能对比 ERA5-HRES 作为真实值进行评估。

Figure 1: Illustration of the target distributions of generative ensemble emulation ( gefs-full ) and post-processing (Mixture). Shown are the histograms (bars: frequencies with 12 shared bins, curves: Gaussian kernel density estimators fit to the bars), i.e . , the empirical distributions of the su

实验结果

研究问题

RQ1扩散型仿真器是否能够从少量种子预测生成大量集合，并再现基于物理的 GEFS 集合的统计性质？
RQ2与基于物理的集合单独相比，生成后处理（与 ERA5 数据混合）是否提高可靠性和极端事件预报？
RQ3生成的集合在空间一致性、多变量相关性和谱特征方面与真实值相比如何？
RQ4使用 SEEDS 生成数百至数千个集合成员时的计算效率提升如何？

主要发现

生成的集合（seeds-gee 与 seeds-gpp）在协方差结构和能量谱方面接近 GEFS-full 与 ERA5，包含了真实的空间模式。
Seeds-gee 在 RMSE、ACC 和 CRPS 上与 gefs-full 相当，而 seeds-gpp 在近地面温度和极端事件覆盖方面常常超过 gefs-full。
生成性后处理（seeds-gpp）提供最高的可靠性（较低的等级直方图不可靠性 δ）和更好的极端事件分类（Brier 分数），优于完整物理基础集合。
模型仅用 2 个种子预测就可生成 512 个成员，计算量几乎无额外增加（在 TPUv3 上每批次 3 分钟），使极大规模集合成为可能。
生成的集合更好地捕捉尾部事件，扩展了覆盖范围，促进罕见事件的不确定性量化。
在最长可预测期达 16 天时，生成集合与完整 GEFS 集合保持强相关，表明它们学习了超越简单气候平均的动力学。

Figure 2: Maps of total column vertically-integrated water vapor ( $kg/m^{2}$ ) for 2022/07/14, as captured by (top left) the ERA5 reanalysis, (top right and middle row) 5 members of the gefs-full forecast issued with a 7-day lead time, and (bottom) 3 samples from seeds-gee . The top 2 GEFS forecast

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。