QUICK REVIEW

[论文解读] UMA: A Family of Universal Models for Atoms

Brandon M. Wood, Misko Dzamba|ArXiv.org|Jun 30, 2025

Machine Learning in Materials Science被引用 20

一句话总结

UMA 提供一系列通用机器学习原子势，在约5亿个原子结构上对分子、材料和催化剂进行训练。该模型使用 Mixture of Linear Experts（MoLE）以高效扩展容量，并在不进行任务特定微调的情况下实现具有竞争力或最先进的性能。

ABSTRACT

The ability to quickly and accurately compute properties from atomic simulations is critical for advancing a large number of applications in chemistry and materials science including drug discovery, energy storage, and semiconductor manufacturing. To address this need, Meta FAIR presents a family of Universal Models for Atoms (UMA), designed to push the frontier of speed, accuracy, and generalization. UMA models are trained on half a billion unique 3D atomic structures (the largest training runs to date) by compiling data across multiple chemical domains, e.g. molecules, materials, and catalysts. We develop empirical scaling laws to help understand how to increase model capacity alongside dataset size to achieve the best accuracy. The UMA small and medium models utilize a novel architectural design we refer to as mixture of linear experts that enables increasing model capacity without sacrificing speed. For example, UMA-medium has 1.4B parameters but only ~50M active parameters per atomic structure. We evaluate UMA models on a diverse set of applications across multiple domains and find that, remarkably, a single model without any fine-tuning can perform similarly or better than specialized models. We are releasing the UMA code, weights, and associated data to accelerate computational workflows and enable the community to continue to build increasingly capable AI models.

研究动机与目标

激发对快速、准确的DFT代理在多样化化学领域（材料、分子、催化剂）中的需求。
证明一个单一的大规模模型即可在无需微调的情况下跨任务泛化。
开发可扩展架构（MoLE），在不牺牲推理速度的前提下增加容量。
提出一种两阶段训练过程，以在速度与能量守恒精度之间取得平衡。
发布代码、权重和数据，以便社区广泛使用与验证。

提出的方法

采用基于 eSEN 的等变图神经网络架构，扩展了总电荷、自旋和 DFT 任务输入。
引入 Mixture of Linear Experts（MoLE），其输出是线性专家的密集组合，以保持平滑的能量面和旋转等变性。
通过一个小型多层感知机从系统级嵌入计算专家权重 α，并使用预计算的 W* = Σk αkWk 以保持推理快速。
通过两阶段调度进行训练：先直接预测力，然后通过自动梯度微调以实现能量守恒和应力。
使用 BF16 进行预训练，微调时切换到 FP32 以保持精度；采用内存/图并行来将大规模 MoLE 配置扩展到。
在约 5e8 个原子结构上进行训练，来自多样数据集（材料、分子、催化剂），并采用能量参考方案来在不同 DFT 设置下进行多任务学习。

实验结果

研究问题

RQ1一个单一、非微调模型是否能在覆盖材料、分子和催化的多种 DFT 任务间达到有竞争力的精度？
RQ2模型大小、数据量和计算成本如何相互作用以确定最佳 UMA 配置？
RQ3MoLE 架构在多任务 MLIPs 相对于稠密模型提供了哪些优势，特别是对于如 MD 这样的长时间运行的仿真？
RQ4统一模型是否能在多样任务和数据集之间维持能量守恒和光滑的势能面？

主要发现

UMA 在材料、催化、分子、分子晶体和 MOFs 上实现有竞争力或最先进的性能，且无需任务特定微调。
MoLE 提供显著的效率，在同等规模下，活跃参数大约少 2.5 倍，损失相近（例如 UMA-M）。
UMA-S、UMA-M、UMA-L 在 Matbench Discovery 和吸附能基准测试上表现强劲，包括催化中 AdsorbML 成功率提升 25%。
单一模型处理长 MD 滚动，推理速度和内存使用良好，使在单个 80GB GPU 上进行 100k+ 原子仿真成为可能，并具备多显卡扩展潜力。
两阶段训练实现高效学习并保持能量守恒，BF16 预训练后再进行 FP32 微调。
经验性尺度定律表明性能与模型大小和数据呈对数线性增长，为计算最优和推理最优训练策略提供指南。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。