Skip to main content
QUICK REVIEW

[论文解读] Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem

Brian L. Trippe, Jason Yim|arXiv (Cornell University)|Jun 8, 2022
Protein Structure and Dynamics被引用 96
一句话总结

本文提出 ProtDiff,这是用于蛋白质骨架的三维扩散模型,以及 SMCDiff,一种用于条件采样的序贯蒙特卡洛方法,用于支架模体。它能生成多样且更长的支架(最多80个残基),并与 AlphaFold2 的预测保持一致。

ABSTRACT

Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.

研究动机与目标

  • Motivate motif-scaffolding as a scalable, diverse scaffold design problem.
  • Develop a 3D diffusion model (ProtDiff) for protein backbones that respects geometric invariances.
  • Create a conditional sampling method (SMCDiff) to scaffold motifs by inpainting unconditionally trained diffusion models.
  • Demonstrate that scaffolds up to 80 residues can be generated and are consistent with AlphaFold2 predictions.

提出的方法

  • ProtDiff: an E(3)-equivariant graph neural network diffusion model over 3D protein backbones.
  • Fully connected graph representation with sequence-ordered nodes and sinusoidal positional encodings.
  • Noise prediction via epsilon_theta, implemented as an EGNN with translation/rotation equivariance.
  • SMCDiff: a sequential Monte Carlo method for exact conditional sampling from an unconditional diffusion model, enabling motif inpainting.
  • Theoretical guarantee: if the diffusion model matches the data, SMCDiff provides exact conditional samples in the large-compute limit.

实验结果

研究问题

  • RQ1Can a diffusion model learn a distribution over realistic 3D protein backbones extendable to longer scaffolds around a given motif?
  • RQ2Can SMCDiff produce diverse and accurate motif-scaffolding scaffolds conditioned on a motif, with provable correctness in the large-compute limit?
  • RQ3How well do generated backbones align with AlphaFold2-predicted structures and support designable motifs?
  • RQ4What are the trade-offs between scaffold length, designability, and computational cost for conditional motif scaffolding?

主要发现

  • ProtDiff can sample backbone structures up to 80 residues around a motif.
  • SMCDiff enables diverse scaffold generation around a fixed motif and attains motif RMSD below 1 Å for 80-residue scaffolds in at least the 5trv case.
  • Backbone samples achieve designable alignment with AlphaFold2 predictions, with scTM > 0.5 indicating designability in tested cases.
  • Unconditionally sampled backbones show diversity and some designability (11.8% with scTM > 0.5 across 50–128 residues), but left-handed helices are common (45% have a left-handed helix).
  • SMCDiff provides asymptotically exact conditional samples given an accurate diffusion model and sufficient compute (number of particles).
  • Conditional scaffolding with 64 particles takes about 2 minutes per sample, competitive with other inpainting approaches.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。