QUICK REVIEW

[论文解读] DiffDA: a Diffusion Model for Weather-scale Data Assimilation

Langwen Huang, Lukas Gianinazzi|arXiv (Cornell University)|Jan 11, 2024

Meteorological Phenomena and Simulations被引用 10

一句话总结

DiffDA 使用去噪扩散模型，基于 GraphCast，同化具有预测状态和稀疏观测的高分辨率大气数据，实现近实时的再分析与预测就绪的初始条件，具有最长 24 小时的提前损失。

ABSTRACT

The generation of initial conditions via accurate data assimilation is crucial for weather forecasting and climate modeling. We propose DiffDA as a denoising diffusion model capable of assimilating atmospheric variables using predicted states and sparse observations. Acknowledging the similarity between a weather forecast model and a denoising diffusion model dedicated to weather applications, we adapt the pretrained GraphCast neural network as the backbone of the diffusion model. Through experiments based on simulated observations from the ERA5 reanalysis dataset, our method can produce assimilated global atmospheric data consistent with observations at 0.25 deg (~30km) resolution globally. This marks the highest resolution achieved by ML data assimilation models. The experiments also show that the initial conditions assimilated from sparse observations (less than 0.96% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5. This enables the application of the method to real-world applications, such as creating reanalysis datasets with autoregressive data assimilation.

研究动机与目标

展示一种能够处理高分辨率大气数据的基于机器学习的数据同化方法。
将预训练的天气预报模型集成为扩散式同化器的主干。
在训练和推理期间对预测状态进行条件化，并在推理时对稀疏观测进行条件化。
使自回归数据同化成为可能，以生成原则上与 ERA5 兼容的再分析数据。

提出的方法

将 GraphCast 调整为用于数据同化的去噪扩散模型。
在训练和推理期间对扩散模型进行预测状态 x̂ 的条件化。
在推理时使用对稀疏观测的软掩蔽和插值策略对扩散过程进行条件化。
使用两阶段条件化方法以在无观测时进行后处理。
以扩散目标函数进行训练，以学习 p(x^0 | x̂) 并通过反向扩散步骤采样。
为替代的 forecast 主干提供 plug-in 灵活性，超越 GraphCast 的限定。

Figure 1: Diagram of numerical weather forecast pipeline. It consists of data assimilation, forecast and post-processing. Data assimilation produces gridded values from sparse observations and predicted gridded values from previous time steps. Forecast takes in gridded values and produces prediction

实验结果

研究问题

RQ1扩散式模型是否能够在同时对预测状态和稀疏观测进行条件化下同化高分辨率大气场？
RQ2在训练和推理阶段对预测状态进行条件化是否能提高同化与地面真值的接近程度？
RQ3该方法是否能够产生具有再分析特征的数据并在可接受的提前损失下保持预测就绪？
RQ4在自回归数据同化和不同观测量条件下，该方法的性能如何？

主要发现

随着使用的观测越多，同化数据更接近地面真值。
作为预测模型输入时，使用同化数据的 48 小时预报误差相比使用地面真值初始条件时的提前损失最多为 24 小时。
该方法能够通过自回归同化循环生成具有再分析特征的数据。
在训练和推理时对预测状态进行条件化，即使在推理时没有观测，也提供了后处理能力。
对观测采用软掩蔽策略比硬掩蔽在条件化效果上更优。
该方法可扩展到 0.25 度分辨率，具有 13 个垂直层，并以 GraphCast 为主干。

Figure 2: Architecture of the diffusion based data assimilation method. We take advantage of the input and output shape of the pretrained GraphCast model which takes the state of the atmosphere at two time steps as input. In each iteration of the denoising diffusion process in our method, the adapte

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。