QUICK REVIEW

[论文解读] Physics-Informed Video Diffusion For Shallow Water Equations

Yang Bai, George Eskandar|arXiv (Cornell University)|Feb 24, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

本论文提出一种将物理信息融入的视频扩散框架，联生成水面视频帧与对应的浅水方程状态，具备快速推断与物理合理性，无需单独的渲染步骤。

ABSTRACT

Traditional fluid dynamics simulation pipelines combine numerical solvers with rendering, producing highly realistic results but at considerable computational cost. Diffusion-based generative video models offer a faster alternative, yet often ignore physical laws and thus fail to capture consistent dynamics. We propose a physics-informed video diffusion framework that jointly generates visual outputs and physical states. Unlike prior two-stage approaches that first simulate the physical variables and then render, we directly integrate physics constraints into the generative process, enabling simultaneous prediction of physical states and realistic videos without a separate rendering step. Built on the two-dimensional shallow water equations with terrain topography, our method produces temporally coherent water flow while maintaining physical plausibility. Experiments show that it outperforms purely data-driven video diffusion baselines in both realism and physical fidelity, while generating videos significantly faster than traditional simulation-plus-rendering pipelines.

研究动机与目标

通过将基于扩散的视频生成与基于网格的浅水方程相结合，推动对流体动力学的更快、物理一致可视化。
在扩散模型中嵌入初始条件和地形，以联合预测视频帧和物理状态。
在保持大部分仿真精度的同时，实现相对于传统仿真+渲染的显著运行时提速。
提供一个框架，在输出中保持时间一致性和物理可解释性。

提出的方法

提出一个多模态的图像条件潜在扩散模型，输出视频帧和浅水方程状态。
在与视频相同的潜在分辨率上，将初始浅水条件和地形嵌入物理嵌入层，并对视频与物理潜在变量应用独立扩散。
将物理和边界条件嵌入与视频潜在变量和提示嵌入拼接，利用扩散变换器实现时空降噪。
使用单独的投影头将去噪表示映射到视频潜在变量和物理潜在变量，实现联合生成。
通过联合损失（视频重建和物理状态重建）进行训练，以强化物理一致性。
将物理量与二维浅水方程的有限体积离散化、使用ROE通量和TVD Runge–Kutta进行时间推进；通过床坡源项将地形引入。

实验结果

研究问题

RQ1扩散型视频模型是否可以被浅水方程和地形引导，以产生物理上合理的水动力学？
RQ2在视频与物理状态的联合生成下，是否比纯数据驱动基线在物理保真度和时间一致性方面有改进？
RQ3相对于传统的仿真+渲染管线，在保持保真度的前提下，可以达到哪些运行时改进？
RQ4哪种物理嵌入策略（线性插值、基于CNN的、还是基于MLP的）在SWEs条件下对视频质量的保留最好？

主要发现

Table 1: Method	LPIPS ↓	SSIM ↑	PSNR ↑	FVD ↓
CogVideoX-Fun	0.2262	0.7994	18.63	189.53
CogVideoX (I2V)-LoRA	0.2241	0.8036	18.89	178.37
Naive without Physics	0.2411	0.7862	18.28	192.64
LI. with Physics	0.1588	0.8355	22.19	137.20
MLP with Physics	0.1366	0.8423	24.91	128.69
CNN with Physics	0.1341	0.8519	25.86	125.13

物理信息模型在视觉真实感指标（LPIPS、SSIM、PSNR、FVD）上优于纯数据驱动的视频扩散基线。
在消融实验中，基于CNN的物理嵌入在视频质量方面表现最佳（LI、MLP、CNN的排序）。
推理时间在不同网格分辨率下几乎保持恒定，而传统管线随分辨率升高而时间增加。
该方法相较于传统管线实现了一个数量级以上的加速，同时保持67%-90%的仿真精度。
共同生成的视频和浅水方程状态在时间稳定性和物理合理性方面优于基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。