Skip to main content
QUICK REVIEW

[论文解读] Resolving gradient pathology in physics-informed epidemiological models

Nickson Golooba, Woldegebriel Assefa Woldegerima|arXiv (Cornell University)|Mar 25, 2026
Model Reduction and Neural Networks被引用 0
一句话总结

提出 Conflict-Gated Gradient Scaling (CGGS) 以解决物理信息神经网络在 SEIR 疫情建模中的梯度冲突,确保训练稳定与标准收敛速率。相对于基于幅度的基线,展示了更强的鲁棒性和接近课程化的训练动态。

ABSTRACT

Physics-informed neural networks (PINNs) are increasingly used in mathematical epidemiology to bridge the gap between noisy clinical data and compartmental models, such as the susceptible-exposed-infected-removed (SEIR) model. However, training these hybrid networks is often unstable due to competing optimization objectives. As established in recent literature on ``gradient pathology," the gradient vectors derived from the data loss and the physical residual often point in conflicting directions, leading to slow convergence or optimization deadlock. While existing methods attempt to resolve this by balancing gradient magnitudes or projecting conflicting vectors, we propose a novel method, conflict-gated gradient scaling (CGGS), to address gradient conflicts in physics-informed neural networks for epidemiological modelling, ensuring stable and efficient training and a computationally efficient alternative. This method utilizes the cosine similarity between the data and physics gradients to dynamically modulate the penalty weight. Unlike standard annealing schemes that only normalize scales, CGGS acts as a geometric gate: it suppresses the physical constraint when directional conflict is high, allowing the optimizer to prioritize data fidelity, and restores the constraint when gradients align. We prove that this gating mechanism preserves the standard $O(1/T)$ convergence rate for smooth non-convex objectives, a guarantee that fails under fixed-weight or magnitude-balanced training when gradients conflict. We demonstrate that this mechanism autonomously induces a curriculum learning effect, improving parameter estimation in stiff epidemiological systems compared to magnitude-based baselines. Our empirical results show improved peak recovery and convergence over magnitude-based methods.

研究动机与目标

  • 动机并分析应用于 SEIR 疫情模型的 PINN 的梯度病理问题。
  • 开发一种动态图式、基于几何的梯度门控机制以解决相互冲突的训练信号。
  • 在光滑非凸目标下证明所提 CGGS 方法的收敛性保证。
  • 在收敛、峰值恢复和对噪声数据的鲁棒性方面展示经验性改进。

提出的方法

  • 将 SEIR 动力学与数据损失、ODE 残差损失以及逻辑非负约束相结合进行建模。
  • 引入 Conflict-Gated Gradient Scaling (CGGS),通过与数据梯度的余弦相似度以及幅度平衡项对物理梯度进行门控。
  • 自适应物理权重的更新规则对余弦相似度和梯度范数进行 sigmoid 门控(式(5))。
  • 以 lambda_data = 1 保持数据项锚定,同时自适应 lambda_phy;固定逻辑约束权重以维持生物学有效性。
  • 证明 CGGS 在光滑非凸目标下对数据损失达到一阶驻点的收敛性为 O(1/T)(定理 4.7)。
  • 提供课程学习的解释,即 CGGS 在梯度冲突阶段产生放松阶段,随梯度对齐进入细化阶段。
Figure 1 : Conceptual visualization of CGGS. (Left) The data and physics gradients conflict (opposing directions). (Center) Standard Magnitude Balancing (LRA) equalizes the lengths but ignores the angle. The resultant update vector (black) is minimized, leading to optimization stagnation (“Deadlock”
Figure 1 : Conceptual visualization of CGGS. (Left) The data and physics gradients conflict (opposing directions). (Center) Standard Magnitude Balancing (LRA) equalizes the lengths but ignores the angle. The resultant update vector (black) is minimized, leading to optimization stagnation (“Deadlock”

实验结果

研究问题

  • RQ1数据保真度与物理残差之间的梯度冲突是否会在用于 SEIR 模型的 PINN 中导致帕累托死锁?
  • RQ2一个具备冲突感知的梯度门控机制是否能在有噪声、数据稀疏的情况下保证稳定、可收敛的训练并更好地恢复参数?
  • RQ3与基于幅度的平衡和梯度投影方法相比,CGGS 在收敛性和计算成本方面有何差异?
  • RQ4训练动态是否能够呈现自适应的课程学习,从而改善传染病系统的刚度处理?

主要发现

  • CGGS 通过在梯度冲突(余弦相似度为负)时抑制物理项,避免帕累托死锁。
  • 在光滑非凸情形下,CGGS 对数据损失具有 O(1/T) 的收敛速率,与标准梯度方法相当。
  • 该方法呈现课程效应:在梯度冲突时出现放松阶段,随着梯度对齐进入细化阶段。
  • 经验性表明,CGGS 能在稀疏、带有噪声的数据中鲁棒地恢复 SEIR轨迹,优于基于幅度的基线。
  • CGGS 保持数据锚定不变,而物理约束被自适应门控,从而提升稳定性与收敛性。
Figure 2 : Baseline analysis of a standard PINN training on noisy SEIR data. (Left) The model overfits the noise (blue solid curve) and fails to capture well the true dynamics. (Right) The cosine similarity between data and physics gradients frequently drops below zero (dashed line), indicating dest
Figure 2 : Baseline analysis of a standard PINN training on noisy SEIR data. (Left) The model overfits the noise (blue solid curve) and fails to capture well the true dynamics. (Right) The cosine similarity between data and physics gradients frequently drops below zero (dashed line), indicating dest

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。