QUICK REVIEW

[論文レビュー] Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning

Shuxin Zheng, Jiyan He|arXiv (Cornell University)|Jun 8, 2023

Machine Learning in Materials Science被引用数 30

ひとこと要約

DiG predicts equilibrium distributions of molecular systems by a diffusion-based Graphormer framework conditioned on molecular descriptors, enabling efficient sampling of diverse conformations.

ABSTRACT

Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications are not functions of a single molecular structure, but rather determined from the equilibrium distribution of structures. Traditional methods for obtaining these distributions, such as molecular dynamics simulation, are computationally expensive and often intractable. In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems. Inspired by the annealing process in thermodynamics, DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system, such as a chemical graph or a protein sequence. This framework enables efficient generation of diverse conformations and provides estimations of state densities. We demonstrate the performance of DiG on several molecular tasks, including protein conformation sampling, ligand structure sampling, catalyst-adsorbate sampling, and property-guided structure generation. DiG presents a significant advancement in methodology for statistically understanding molecular systems, opening up new research opportunities in molecular science.

研究の動機と目的

分子系について単一の構造ではなく平衡分布を予測する必要性を動機付ける。
平衡分布を近似し、多様で化学的に妥当な構造をサンプリングできる深層学習フレームワークを提案する。
状態密度の推定を可能にし、特性に誘導された生成による逆設計をサポートする。

提案手法

Distributional Graphormer (DiG) を紹介し、分子記述子で条件付けられた単純な分布をターゲット平衡分布へ変換する拡散ベースの生成を実行する。
ターゲット分布から単純なガウス分布へ移動する前向き Langevin 拡散を用い、Graphormer に基づくスコアモデル s^θ_D,t(R) を用いた逆拡散過程を学習する。
データベーススコアまたはエネルギー関数 E_D を活用する物理情報拡散事前学習 (PIDP) を用いた、逐次的・独立した監督によってスコアモデルを訓練する。
スコアマッチングと拡散理論に基づいて拡散過程を固定化し、離散化された逆ステップを介して平衡分布からサンプル R_0 を取得する。
過程を追跡して拡散経路に沿う密度を推定し、自由エネルギーやエントロピーなどの熱力学量の計算を可能にする。
ターゲット特性 c を条件付けとしてスコアを調整することで、ベイズ則に基づく調整を通じて条件付き分布 q_D,t(R|c) を変換し、逆設計をサポートする。

Figure 1 : Predicting conformational distributions with the Distributional Graphormer (DiG) framework. (a) DiG takes the basic descriptor $\mathcal{D}$ of a target molecular system as input, e.g., amino acid sequence, to generate a probability distribution of structures which aims at approximating t

実験結果

リサーチクエスチョン

RQ1記述子から拡散ベースの生成モデルが分子系の平衡分布を近似できるか？
RQ2DiG はMDシミュレーションおよび実験構造と比較して、多様で機能的に関連する立体配置をどれだけうまくサンプルできるか？
RQ3DiG はリガンド結合ポーズのサンプリングや触媒-吸着体吸着分布を、既存のアプローチと同等かそれ以上の精度で実行できるか？
RQ4平衡データが不足している場合に、物理情報を用いた事前学習 (PIDP) が学習をどのように補強できるか？
RQ5ターゲット特性を条件付けとして、特性誘導型（逆）設計をDiG はサポートできるか？

主な発見

DiG はミリ秒級MDシミュレーションで観察される分布に類似し、既知の機能的状態と整合する多様なタンパク質構造を生成できる。
DiG はポケット内のリガンド構造を生成し、RMSD分布が結晶構造と高い類似性を示し、多くのケースで約2.0 Å程度の最良一致を達成している。
DiG は吸着部位全体でDFT由来のベースラインと緊密に一致する触媒-吸着体吸着配置をサンプリングできる。
物理情報拡散事前学習 (PIDP) は、平衡データが不足している場合に pre-training 中にエネルギー関数を活用することでスコア学習を可能にする。
DiG は特性誘導型構造生成の条件付き生成をサポートし、大規模な条件付きデータセットを必要とせずに逆設計を可能にする。

Figure 2 : Distribution and sampling results for protein conformations.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。