QUICK REVIEW

[论文解读] Score-based Data Assimilation

François Rozet, Gilles Louppe|arXiv (Cornell University)|Jun 18, 2023

Gaussian Processes and Bayesian Inference被引用 13

一句话总结

该论文引入基于分数的数据同化（SDA），在短轨迹片段上学习局部得分模型，以非自回归方式推断完整的状态轨迹，并在推断阶段实现零样本观测引导。

ABSTRACT

Data assimilation, in its most comprehensive form, addresses the Bayesian inverse problem of identifying plausible state trajectories that explain noisy or incomplete observations of stochastic dynamical systems. Various approaches have been proposed to solve this problem, including particle-based and variational methods. However, most algorithms depend on the transition dynamics for inference, which becomes intractable for long time horizons or for high-dimensional systems with complex dynamics, such as oceans or atmospheres. In this work, we introduce score-based data assimilation for trajectory inference. We learn a score-based generative model of state trajectories based on the key insight that the score of an arbitrarily long trajectory can be decomposed into a series of scores over short segments. After training, inference is carried out using the score model, in a non-autoregressive manner by generating all states simultaneously. Quite distinctively, we decouple the observation model from the training procedure and use it only at inference to guide the generative process, which enables a wide range of zero-shot observation scenarios. We present theoretical and empirical evidence supporting the effectiveness of our method.

研究动机与目标

应对高维、长时域动力系统中带噪声、观测不完备的贝叶斯轨迹推断。
开发 SDA，通过使用马尔可夫结构从短轨迹片段学习基于分数的生成模型。
实现整个状态轨迹的非自回归生成，并在推断阶段实现观测引导的解耦。
在混沌系统上提供理论与实证验证，并展示相对于点估计变分方法的优势。

提出的方法

在短轨迹片段 x_{i-k:i+k}(t) 上训练局部得分网络，以近似 p(x_{i-k:i+k}(t)) 的局部分数。
跨片段组合局部分数以近似完整轨迹先验分数 ∇_{x_{1:L}(t)} log p(x_{1:L}(t))。
在训练阶段解耦观测模型，仅在推断阶段使用 p(y|x) 通过未重新训练派生的似然分数来引导采样（零样本）。
使用高斯代理的 Tweedie 公式来获得稳定的似然分数，以近似扰动后的似然 p(y|x(t))。
使用含预测器-校正器（Langevin 修正）的反向 SDE 采样，从后验 p(x_{1:L}(t)|y) 生成轨迹。
利用伪覆盖（pseudo-blanket）思想来证明局部分数，并根据片段大小采用 FCNN/UNet 风格的架构；使用去噪得分匹配进行训练。

实验结果

研究问题

RQ1在数据同化设置中，能否在短轨迹片段上学习的基于分数的模型恢复完整、较长轨迹的后验？
RQ2在推断阶段解耦观测模型是否能实现对多样观测情景的零样本处理？
RQ3与真实地面真值或传统数据同化方法相比，SDA 对完整轨迹后验的近似程度如何？
RQ4在 SDA 中，局部片段大小（k）、采样修正次数（C）与计算效率之间存在哪些权衡？
RQ5在观测模糊的情况下，SDA 是否能合理推断出多个后验模式？

主要发现

SDA 能在混沌动力系统中重现地面真值后验，且在模糊观测下揭示多个合理的后验模式。
局部分数分解结合稳定的似然分数近似，在不对物理模型求导的情况下获得准确的后验推断。
后验准确性随更大的片段窗口（k）和更多的 Langevin 校正（C）的增加而提高，但在中等值以上回报递减。
SDA 实现了整个状态轨迹的非自回归、可并行生成，提升了对长时间跨度的可扩展性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。