Skip to main content
QUICK REVIEW

[论文解读] DISCOVER: A Physics-Informed, GPU-Accelerated Symbolic Regression Framework

Udaykumar Gajera, Mohsen Sotoudeh|arXiv (Cornell University)|Jan 27, 2026
Machine Learning in Materials Science被引用 0
一句话总结

DISCOVER 是一个开源的 Python 原生符号回归框架,融合物理信息约束与可选的 GPU 加速,能够在物理、化学和材料科学领域实现可扩展、可解释的模型发现。

ABSTRACT

Symbolic Regression (SR) enables the discovery of interpretable mathematical relationships from experimental and simulation data. These relationships are often coined descriptors which are defined as a fundamental materials property that is directly correlated to a desired or undesired functional property of the material. Although established approaches such as Sure Independence Screening and Sparsifying Operator (SISSO) have successfully identified low-dimensional descriptors within large feature spaces many existing SR tools integrate poorly with modern Python workflows, offer limited control over the symbolic search space, or struggle with the computational demands of large-scale studies. This paper introduces DISCOVER (Data-Informed Symbolic Combination of Operators for Variable Equation Regression), an open-source symbolic regression package developed to address these challenges through a modular, physics-motivated design. DISCOVER allows users to guide the symbolic search using domain knowledge, constrain the feature space explicitly, and take advantage of optional GPU acceleration to improve computational efficiency in data-intensive workflows, enabling reproducible and scalable SR workflows. The software is intended for applications in computational physics, computational chemistry, and materials science, where interpretability, physical consistency, and execution time are especially important, and it complements general-purpose SR frameworks by emphasizing the discovery of physically meaningful models.

研究动机与目标

  • 在科学领域从数据中启用可解释的符号表达式的引导发现。
  • 通过物理信息约束和量纲分析将领域知识融入。
  • 提供模块化、原生 Python 设计,支持大规模研究的可选 GPU 加速。

提出的方法

  • 从用户提供的特征和算子库生成候选符号表达式。
  • 对目标数据评估表达式,识别稀疏、简约的模型。
  • 实现多种稀疏化搜索策略(如 OMP、MIQP、模拟退火)。
  • 通过基于配置的接口和 pint 库进行量纲分析来强制执行物理信息约束。
  • 在 NVIDIA CUDA 与 Apple Metal 上支持 GPU 加速,CPU 执行作为后备。
  • 将搜索框架为一个 L0 正则化的最小二乘问题,以找到稀疏描述符向量。
Figure 1: Overview of the DISCOVER workflow, illustrating iterative feature generation, physics-informed screening, and sparse model selection.
Figure 1: Overview of the DISCOVER workflow, illustrating iterative feature generation, physics-informed screening, and sparse model selection.

实验结果

研究问题

  • RQ1用户定义的物理约束和量纲一致性如何引导符号回归搜索,以产生物理上有意义的模型?
  • RQ2在大特征空间中硬件加速对受约束的符号回归效率有何影响?
  • RQ3通过可配置的稀疏性和算子约束,DISCOVER 能否在预测准确性与模型可解释性之间取得平衡?

主要发现

  • 提供一个支持物理信息约束和硬件加速计算的 Python 原生 SR 框架。
  • 提供包括启发式、基于优化和随机方法的模块化搜索策略,用于稀疏模型发现。
  • 通过 pint 库的量纲一致性,在搜索早期就剔除物理上无效的表达式。
  • 展示在 CPU 和 GPU 上的可扩展符号回归工作流,适用于数据密集、以科学为导向的应用。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。