QUICK REVIEW

[论文解读] Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja|ArXiv.org|Jan 15, 2025

Model Reduction and Neural Networks被引用 5

一句话总结

这篇论文提出基于海森矩阵的知识蒸馏，以从基础模型创建快速、专业化的 MLFFs，在 MD 仿真中实现显著加速，同时保持准确性和能量守恒。

ABSTRACT

The foundation model (FM) paradigm is transforming Machine Learning Force Fields (MLFFs), leveraging general-purpose representations and scalable training to perform a variety of computational chemistry tasks. Although MLFF FMs have begun to close the accuracy gap relative to first-principles methods, there is still a strong need for faster inference speed. Additionally, while research is increasingly focused on general-purpose models which transfer across chemical space, practitioners typically only study a small subset of systems at a given time. This underscores the need for fast, specialized MLFFs relevant to specific downstream applications, which preserve test-time physical soundness while maintaining train-time scalability. In this work, we introduce a method for transferring general-purpose representations from MLFF foundation models to smaller, faster MLFFs specialized to specific regions of chemical space. We formulate our approach as a knowledge distillation procedure, where the smaller "student" MLFF is trained to match the Hessians of the energy predictions of the "teacher" foundation model. Our specialized MLFFs can be up to 20 $ imes$ faster than the original foundation model, while retaining, and in some cases exceeding, its performance and that of undistilled models. We also show that distilling from a teacher model with a direct force parameterization into a student model trained with conservative forces (i.e., computed as derivatives of the potential energy) successfully leverages the representations from the large-scale teacher for improved accuracy, while maintaining energy conservation during test-time molecular dynamics simulations. More broadly, our work suggests a new paradigm for MLFF development, in which foundation models are released along with smaller, specialized simulation "engines" for common chemical subsets.

研究动机与目标

Motivate the need for fast, specialized MLFFs that preserve physical soundness in downstream tasks.
Propose a KD framework that distills energy Hessians from a foundation MLFF into smaller, faster MLFFs.
Demonstrate the approach across multiple foundation models, datasets, and downstream chemical spaces.
Show that specialized MLFFs can outperform or match their teachers while achieving large inference speedups.

提出的方法

Precompute energy Hessians of the foundation model on a specialized data subset.
Train a smaller student MLFF to minimize a joint loss: energy/force matching plus Hessian alignment to the teacher.
Use Hessian rows sampled via subsampling to reduce the cost of Hessian supervision.
Leverage vector-Jacobian products to extract Hessian rows efficiently without forming full Hessians.
Optionally include a gradient-based energy-consistency term to improve direct-force models.
Compare Hessian KD to baselines (undistilled, n2n, a2a) across multiple datasets and tasks.

Figure 1: Proposed Hessian distillation schematic. In our proposed distillation approach, we start with a machine learning force field (MLFF) foundation model (FM) that has been trained on a large quantity of diverse data. We precompute energy Hessians of the FM over a specialized data subset. We th

实验结果

研究问题

RQ1Can Hessian-based distillation yield fast, specialized MLFFs without sacrificing physical soundness?
RQ2How does Hessian KD compare to node-feature distillation (n2n) and other baselines in accuracy and MD stability?
RQ3Does subsampling Hessian supervision maintain performance while reducing training cost?
RQ4Can distilled models outperform the original foundation models on specialized downstream tasks while offering substantial speedups?

主要发现

Specialized MLFFs distilled from foundation models achieve up to 20x faster inference than the original FMs.
Distilled models often match or exceed FM performance on the specialized tasks and can outperform undistilled baselines.
Hessian distillation improves energy/force MAE, MD stability, energy conservation, and geometry optimization compared to baselines.
Hessian subsampling (even s=1) preserves accuracy while substantially reducing training cost.
Distillation from larger JMP-L FMs yields better energy conservation in NVE MD simulations than undistilled FMs.

Figure 2: Energy Conservation in NVE MD Simulations of Buckyball Catcher. We plot the change in the model predicted energy over the trajectory for 5 independent initial conditions. Some simulations become unstable before 100 ps (denoted by $\times$ ). (a) Hessian distillation improves the energy con

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。