QUICK REVIEW

[論文レビュー] SE(3) diffusion model with application to protein backbone generation

Jason Yim, Brian L. Trippe|arXiv (Cornell University)|Feb 5, 2023

Protein Structure and Dynamics被引用数 71

ひとこと要約

この論文は FrameDiff を開発し、 pretrained 構造予測子なしで多フレームに渡る SE(3) 不変拡散モデルを用いてタンパク質のバックボーンを生成し、最大 500残基の設計可能なモノマーを達成します。

ABSTRACT

The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.

研究の動機と目的

複数のフレーム (SE(3)^N) によって記述されるバックボーンのための原理的な SE(3) 不変拡散フレームワークを開発する。
SE(3) およびそのリーマン群構造上で前方拡散過程と後方拡散過程を理論的に導出し、DSM 訓練を可能にする。
理論を SE(3) 等変スコアネットワークを備える実用的なバックボーン生成モデル（FrameDiff）へ落とし込む。
pretrained 構造予測子に依存せず、最大 500 アミノ酸の設計可能なタンパク質モノマーを実証する。
pretrained ベースラインと比較して、生成多様性を維持しつつ、in silico 設計適性で競争力を示す。

提案手法

SE(3) 上の前方拡散を回転成分（SO(3)）と並進成分（R^3）を中心化した SE(3)^N プロセスによりデカップリングして構築する。
SO(3) の explicit Brownian motion と heat-kernel 式を用いた compacte Lie 群上の DSM 訓練を導出する。
中心化した SE(3)^N_0 形式を用いて SE(3)^N の中心化により SE(3) 不変性を実現し、SO(3) 等変ネットワークへ射影する。
FrameDiff を提案する。FrameDiff は AlphaFold2 風の構造モジュールを IPA および Transformer 成分とともに用いたニューラルスコアネットワークを用いたバックボーンの SE(3) 不変拡散モデルである。
フレーム更新とねじれ角 psi の両方を予測し、補助的な重心形のバックボーンや局所Distance損失を用いて微細な幾何を改善する。

Figure 1: Method overview. (A) Backbone parameterization with frames. Each residue along the protein chain shares the same structure of backbone atoms due to the fixed bonds between each atom. Performing the GramSchmidt operation on vectors $v_{1},v_{2}$ results in rotation matrix $r$ that parameter

実験結果

リサーチクエスチョン

RQ1SE(3) 不変拡散を複数のバックボーンフレーム上で定式化・訓練してタンパク質バックボーンをモデル化できるか。
RQ2 DSM をリーマン多様体、特に SE(3) および SO(3) に適用してエンドツーエンドのバックボーン生成を実現できるか。
RQ3 SE(3)^N を中心化することは真の SE(3) 不変性を実現し、学習効率を改善するか。
RQ4 FrameDiff は pretrained 構造予測子なしで最大 500 アミノ酸の設計可能なタンパク質モノマーを生成できるか。
RQ5 FrameDiff は pretrained 拡散ベースラインと比較して in silico 設計性とサンプル多様性でどうなるか。

主な発見

FrameDiff は長さ 500 までの設計可能で多様性と新規性を備えたタンパク質モノマーを生成できる。
FrameDiff の in silico 設計性は競争力があり、より多くのパラメータを持つ pretrained モデルに次ぐ。
理論は SE(3)^N 上で中心化と SE(3) 等変ネットワークの使用により principled な SE(3) 不変拡散を提供する。
このアプローチは構造予測ネットワークの事前学習に依存せず、複数フレーム上で SE(3) 等変スコアを学習できる。
実験は FrameDiff が既知のタンパク質構造を超えて一般化しつつ、妥当なバックボーン幾何を維持することを示唆する。

Figure 2: Single layer of $\mathrm{FrameDiff}$ . Each layer takes in the current node embedding $\mathbf{h}_{\ell}$ , edge embedding $\mathbf{z}_{\ell}$ , frames $\mathbf{T}_{\ell}$ , and initial node embedding $\mathbf{h}_{0}$ . Rectangles indicate trainable neural networks. Node embeddings are fir

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。