[論文レビュー] PreDiff: Precipitation Nowcasting with Latent Diffusion Models
PreDiff は条件付き潜在拡散モデルを用いた確率的降水今 forecast を行い、サンプリング時に領域特有の物理制約を強制するための知識整合メカニズムを導入します。
Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge alignment mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
研究の動機と目的
- Address uncertainty and multimodal forecasting in precipitation nowcasting.
- Develop a conditional latent diffusion model (PreDiff) for probabilistic forecasts.
- Incorporate domain knowledge via a knowledge alignment mechanism to enforce physical constraints during sampling.
- Showcase performance on synthetic N-body MNIST and real SEVIR precipitation datasets.
提案手法
- Train a frame-wise VAE to map pixel space to a latent space.
- Use a conditional latent diffusion model in latent space (Earthformer-UNet backbone) to denoise and predict future latents.
- Predict the denoising transition z_t-1 from z_t conditioned on z_cond via p_theta(z_t-1|z_t,z_cond).
- Introduce a knowledge alignment network U_phi to estimate constraint function F, and modify the denoising transition with p_theta,phi(...) using an energy-guidance term (Eq. 5).
- Ground the alignment in physical priors like energy conservation (N-body MNIST) or anticipated precipitation intensity (SEVIR).
- Training is two-stage: train LDM in latent space, then train lightweight alignment network without retraining the LDM.
実験結果
リサーチクエスチョン
- RQ1Can a latent diffusion model capture multiple plausible future weather scenarios for near-term nowcasting?
- RQ2Does injecting domain-specific prior knowledge during sampling improve physical plausibility and operational utility of forecasts?
- RQ3How does PreDiff compare to state-of-the-art deterministic and probabilistic spatiotemporal forecasting baselines on synthetic and real datasets?
- RQ4What is the impact of knowledge alignment on forecast quality and constraint adherence across tasks like energy conservation and precipitation intensity?
- RQ5Is the knowledge alignment plug-in generalizable across different domains without retraining the core model?
主な発見
| モデル | #パラメータ (M) | MSE | MAE | SSIM | FVD | E.MSE | E.MAE |
|---|---|---|---|---|---|---|---|
| ターゲット | - | 0.000 | 0.000 | 1.0000 | 0.000 | - | - |
| パーシスタンス | - | 104.9 | 139.0 | 0.7270 | 168.3 | - | - |
| UNet [55] | 16.6 | 38.90 | 94.29 | 0.8260 | 142.3 | - | - |
| ConvLSTM [47] | 14.0 | 32.15 | 72.64 | 0.8886 | 86.31 | - | - |
| PredRNN [61] | 23.8 | 21.76 | 54.32 | 0.9288 | 20.65 | - | - |
| PhyDNet [11] | 3.1 | 28.97 | 78.66 | 0.8206 | 178.0 | - | - |
| E3D-LSTM [60] | 12.9 | 22.98 | 62.52 | 0.9131 | 22.28 | - | - |
| Rainformer [1] | 19.2 | 38.89 | 96.47 | 0.8036 | 163.5 | - | - |
| Earthformer [8] | 7.6 | 14.82 | 39.93 | 0.9538 | 6.798 | - | - |
| VideoGPT [65] | 92.2 | 53.68 | 77.42 | 0.8468 | 39.28 | 0.0228 | 0.1092 |
| LDM [42] | 410.3 | 46.29 | 72.19 | 0.8773 | 3.432 | 0.0243 | 0.1172 |
| PreDiff | 120.7 | 9.492 | 25.01 | 0.9716 | 0.987 | 0.0226 | 0.1083 |
| PreDiff-KA | 129.4 | 21.90 | 43.57 | 0.9303 | 4.063 | 0.0039 | 0.0443 |
- PreDiff achieves state-of-the-art performance on N-body MNIST for video forecasting metrics and energy-conservation accuracy.
- On SEVIR, PreDiff attains strong perceptual quality (FVD) and competitive CSI-based metrics, with PreDiff-KA improving constraint alignment.
- Knowledge alignment (PreDiff-KA) substantially improves adherence to physical constraints (energy conservation) with only modest changes to fidelity.
- The latent Earthformer-UNet backbone provides stable and effective spatiotemporal modeling in the diffusion process.
- Knowledge alignment can be trained separately and plugged into inference without retraining the core model.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。