Skip to main content
QUICK REVIEW

[论文解读] Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models

Zhong Wan, Ricardo Baptista|arXiv (Cornell University)|May 24, 2023
Advanced Mathematical Modeling in Engineering被引用 11
一句话总结

本文提出一个用于非配对统计下采样的两阶段概率框架:首先通过最优传输映射进行去偏,然后使用条件扩散模型进行上采样,从而输出与目标统计量匹配的高分辨率结果。

ABSTRACT

We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optimal transport map, and (ii) an upsampling step achieved by a probabilistic diffusion model with a posteriori conditional sampling. This approach characterizes a conditional distribution without needing paired data, and faithfully recovers relevant physical statistics from biased samples. We demonstrate the utility of the proposed approach on one- and two-dimensional fluid flow problems, which are representative of the core difficulties present in numerical simulations of weather and climate. Our method produces realistic high-resolution outputs from low-resolution inputs, by upsampling resolutions of 8x and 16x. Moreover, our procedure correctly matches the statistics of physical quantities, even when the low-frequency content of the inputs and outputs do not match, a crucial but difficult-to-satisfy assumption needed by current state-of-the-art alternatives. Code for this work is available at: https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/probabilistic_diffusion.

研究动机与目标

  • 解决统计尺度化中缺乏配对数据的问题,以获得高保真度、高分辨率输出。
  • 开发一个因式分解映射 C = T^{-1} ∘ C',将去偏与上采样分离。
  • 使用最优传输对低分辨率数据进行去偏,使用扩散模型进行条件的高分辨率采样。
  • 确保方法即使在低/高分辨率光谱差异时也能保留物理统计。
  • 展示在一维和二维流体流动问题上的适用性,具备 8x 和 16x 的上采样。

提出的方法

  • 将尺度化映射 C 因式分解为线性下采样 C' 和可逆去偏映射 T,使得 C = T^{-1} ∘ C'(或等价地 C'_{#}μ_X = T_{#}μ_Y)。
  • 去偏:求解熵正则化的最优传输问题,得到一个将 μ_Y 推送到 μ_Y' 的传输映射 T,使之与 C' μ_X 一致。
  • 上采样:训练一个概率扩散模型以学习高分辨率先验 p(x),并在经过后处理的去噪器用于条件采样的情况下执行后验采样 p(x|E'_{ȳ'}),其中 E'_{ȳ'} = {x: C'x = ȳ'}。
  • 后验条件:通过 C' 的伪逆修改扩散去噪器,以强制在中间低分辨率空间上进行条件化。
  • 实现细节包括带有 VP 调度的基于分数的扩散、用于分数的 Tweedie 公式,以及一个后处理去噪器(Eq. 7)。
  • 通过 OT 的去偏使用熵正则化的 Sinkhorn 计算,产生将 μ_Y 推送到 μ_Y' 的重心投影 T_{γ}(y)。
Figure 1: (a) Upsampling (super-resolution) as frequency extrapolation in the Fourier domain. The model extrapolates low-frequency content to higher-frequencies (dashed blue). The debiasing map corrects the biased low-frequency content (solid red). (b) Illustration of the proposed framework where $\
Figure 1: (a) Upsampling (super-resolution) as frequency extrapolation in the Fourier domain. The model extrapolates low-frequency content to higher-frequencies (dashed blue). The debiasing map corrects the biased low-frequency content (solid red). (b) Illustration of the proposed framework where $\

实验结果

研究问题

  • RQ1非配对的低分辨率和高分辨率数据是否能够被下采样为高分辨率输出,从而真实再现目标统计量?
  • RQ2通过最优传输进行去偏是否能在上采样前有效地将低频统计量与高分辨率分布对齐?
  • RQ3条件扩散模型是否能够可靠地从 p(x|C'x = y') 采样,以生成现实的高分辨率场并保持物理统计?
  • RQ4在非配对下采样任务中,将去偏与上采样分离与端到端方法相比有哪些收益?

主要发现

指标KS 8x (LFLR)KS 8x (OT-corrected)NS 8x (LFLR)NS 8x (OT-corrected)NS 16x (LFLR)NS 16x (OT-corrected)
covRMSE↓0.3430.0810.4580.0830.4770.079
MELRu↓0.2010.0201.2540.0130.6000.016
MELRw↓0.1440.0200.1960.0260.2000.025
KLD↓1.4640.01829.300.03312.260.017
  • OT 去偏在 KS 和 NS 测试的多项指标上显著改善统计对齐,使后续基于扩散的上采样效果更好。
  • 对经过 OT 校正的中间数据进行扩散条件化,得到的样本具有接近目标的能谱且相对无条件或未去偏基线的散度更小。
  • 在 NS 的 8x 和 16x 下采样上,所提出的方法在多个指标 (covRMSE, MELR, KLD, Wasserstein, MMD) 上超越基线(如 BCSD、cycleGAN、ClimAlign、ViT-based 上采样)。
  • 该框架提供真实的小尺度特征和显著的变异性,通过其概率采样提供不确定性量化。
  • OT 校正至关重要;若无校正,条件化会因扩散轨迹的偏差污染而退化统计。
  • 定性结果显示,与若干基线相比,该方法产生更清晰、物理上合理的涡度场。
  • 该方法在 8x 与 16x 下采样下均有效,显示出对较大分辨率差距的鲁棒性。
Figure 2: (a) KS samples generated with diffusion model conditioned on LR information with and without OT correction applied, (b) empirical probability density function for relevant LR and HR samples in KS and (c) mode-wise log energy ratios with respect to the true samples (Eq. ( 13 ) without weigh
Figure 2: (a) KS samples generated with diffusion model conditioned on LR information with and without OT correction applied, (b) empirical probability density function for relevant LR and HR samples in KS and (c) mode-wise log energy ratios with respect to the true samples (Eq. ( 13 ) without weigh

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。