QUICK REVIEW

[論文レビュー] Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem

Brian L. Trippe, Jason Yim|arXiv (Cornell University)|Jun 8, 2022

Protein Structure and Dynamics被引用数 96

ひとこと要約

この論文は ProtDiff を提案する、タンパク質骨格の3D拡散モデルと、モチーフの足場作成のための条件付きサンプリング手法 SMCDiff を紹介します。これにより、AlphaFold2予測と一致する多様で長い足場（最大80残基）を生成できるようになります。

ABSTRACT

Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.

研究の動機と目的

モチーフ足場設計を、スケーラブルで多様な足場デザイン問題として動機づける。
幾何不変性を尊重するタンパク質バックボーンの3D拡散モデル（ProtDiff）を開発する。
事前学習済みの拡散モデルをインペインティングしてモチーフを足場する条件付きサンプリング法（SMCDiff）を作成する。
80残基までの足場を生成でき、AlphaFold2予測と一致することを示す。

提案手法

ProtDiff: an E(3)-equivariant graph neural network diffusion model over 3D protein backbones.
Fully connected graph representation with sequence-ordered nodes and sinusoidal positional encodings.
Noise prediction via epsilon_theta, implemented as an EGNN with translation/rotation equivariance.
SMCDiff: a sequential Monte Carlo method for exact conditional sampling from an unconditional diffusion model, enabling motif inpainting.
Theoretical guarantee: if the diffusion model matches the data, SMCDiff provides exact conditional samples in the large-compute limit.

実験結果

リサーチクエスチョン

RQ1拡散モデルは、与えられたモチーフの周りにより長い足場へ拡張可能な、現実的な3Dタンパク質バックボーンの分布を学習できるか。
RQ2SMCDiffはモチーフを条件として多様で正確なモチーフ足場を生成でき、十分な計算で正確性が証明可能か。
RQ3生成されたバックボーンはAlphaFold2予測構造とどれだけ整合し、設計可能なモチーフをサポートするか。
RQ4条件付きモチーフ足場作成における足場長、設計可能性、計算コストのトレードオフは何か。

主な発見

ProtDiffはモチーフの周り80残基までのバックボーン構造をサンプルできる。
SMCDiffは固定モチーフ周りの多様な足場生成を可能にし、80残基の足場で少なくとも5trvケースにおいてモチーフRMSDを1 Å以下に達成する。
バックボーンサンプルはAlphaFold2予測と設計可能な整合を達成し、scTM > 0.5 が設計可能性を示す（テストケースで）。
無条件サンプル化されたバックボーンは多様性とある程度の設計可能性を示す（50–128残基でscTM > 0.5は11.8%）、ただし左手螺旋が一般的である（45%に左手螺旋あり）。
SMCDiffは正確な拡散モデルと十分な計算資源（粒子数）を前提とすれば、漸近的に正確な条件付きサンプルを提供する。
64粒子での条件付き足場作成はサンプルあたり約2分で、他のインペインティング手法と競合する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。