Skip to main content
QUICK REVIEW

[Paper Review] Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja|ArXiv.org|Jan 15, 2025
Model Reduction and Neural Networks5 citations
TL;DR

The paper introduces Hessian-based knowledge distillation to create fast, specialized MLFFs from foundation models, achieving significant speedups while preserving accuracy and energy conservation in MD simulations.

ABSTRACT

The foundation model (FM) paradigm is transforming Machine Learning Force Fields (MLFFs), leveraging general-purpose representations and scalable training to perform a variety of computational chemistry tasks. Although MLFF FMs have begun to close the accuracy gap relative to first-principles methods, there is still a strong need for faster inference speed. Additionally, while research is increasingly focused on general-purpose models which transfer across chemical space, practitioners typically only study a small subset of systems at a given time. This underscores the need for fast, specialized MLFFs relevant to specific downstream applications, which preserve test-time physical soundness while maintaining train-time scalability. In this work, we introduce a method for transferring general-purpose representations from MLFF foundation models to smaller, faster MLFFs specialized to specific regions of chemical space. We formulate our approach as a knowledge distillation procedure, where the smaller "student" MLFF is trained to match the Hessians of the energy predictions of the "teacher" foundation model. Our specialized MLFFs can be up to 20 $ imes$ faster than the original foundation model, while retaining, and in some cases exceeding, its performance and that of undistilled models. We also show that distilling from a teacher model with a direct force parameterization into a student model trained with conservative forces (i.e., computed as derivatives of the potential energy) successfully leverages the representations from the large-scale teacher for improved accuracy, while maintaining energy conservation during test-time molecular dynamics simulations. More broadly, our work suggests a new paradigm for MLFF development, in which foundation models are released along with smaller, specialized simulation "engines" for common chemical subsets.

Motivation & Objective

  • Motivate the need for fast, specialized MLFFs that preserve physical soundness in downstream tasks.
  • Propose a KD framework that distills energy Hessians from a foundation MLFF into smaller, faster MLFFs.
  • Demonstrate the approach across multiple foundation models, datasets, and downstream chemical spaces.
  • Show that specialized MLFFs can outperform or match their teachers while achieving large inference speedups.

Proposed method

  • Precompute energy Hessians of the foundation model on a specialized data subset.
  • Train a smaller student MLFF to minimize a joint loss: energy/force matching plus Hessian alignment to the teacher.
  • Use Hessian rows sampled via subsampling to reduce the cost of Hessian supervision.
  • Leverage vector-Jacobian products to extract Hessian rows efficiently without forming full Hessians.
  • Optionally include a gradient-based energy-consistency term to improve direct-force models.
  • Compare Hessian KD to baselines (undistilled, n2n, a2a) across multiple datasets and tasks.
Figure 1: Proposed Hessian distillation schematic. In our proposed distillation approach, we start with a machine learning force field (MLFF) foundation model (FM) that has been trained on a large quantity of diverse data. We precompute energy Hessians of the FM over a specialized data subset. We th
Figure 1: Proposed Hessian distillation schematic. In our proposed distillation approach, we start with a machine learning force field (MLFF) foundation model (FM) that has been trained on a large quantity of diverse data. We precompute energy Hessians of the FM over a specialized data subset. We th

Experimental results

Research questions

  • RQ1Can Hessian-based distillation yield fast, specialized MLFFs without sacrificing physical soundness?
  • RQ2How does Hessian KD compare to node-feature distillation (n2n) and other baselines in accuracy and MD stability?
  • RQ3Does subsampling Hessian supervision maintain performance while reducing training cost?
  • RQ4Can distilled models outperform the original foundation models on specialized downstream tasks while offering substantial speedups?

Key findings

  • Specialized MLFFs distilled from foundation models achieve up to 20x faster inference than the original FMs.
  • Distilled models often match or exceed FM performance on the specialized tasks and can outperform undistilled baselines.
  • Hessian distillation improves energy/force MAE, MD stability, energy conservation, and geometry optimization compared to baselines.
  • Hessian subsampling (even s=1) preserves accuracy while substantially reducing training cost.
  • Distillation from larger JMP-L FMs yields better energy conservation in NVE MD simulations than undistilled FMs.
Figure 2: Energy Conservation in NVE MD Simulations of Buckyball Catcher. We plot the change in the model predicted energy over the trajectory for 5 independent initial conditions. Some simulations become unstable before 100 ps (denoted by $\times$ ). (a) Hessian distillation improves the energy con
Figure 2: Energy Conservation in NVE MD Simulations of Buckyball Catcher. We plot the change in the model predicted energy over the trajectory for 5 independent initial conditions. Some simulations become unstable before 100 ps (denoted by $\times$ ). (a) Hessian distillation improves the energy con

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.