QUICK REVIEW

[论文解读] Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem

Brian L. Trippe, Jason Yim|arXiv (Cornell University)|Jun 8, 2022

Protein Structure and Dynamics被引用 96

一句话总结

本文提出 ProtDiff，这是用于蛋白质骨架的三维扩散模型，以及 SMCDiff，一种用于条件采样的序贯蒙特卡洛方法，用于支架模体。它能生成多样且更长的支架（最多80个残基），并与 AlphaFold2 的预测保持一致。

ABSTRACT

Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.

研究动机与目标

Motivate motif-scaffolding as a scalable, diverse scaffold design problem.
Develop a 3D diffusion model (ProtDiff) for protein backbones that respects geometric invariances.
Create a conditional sampling method (SMCDiff) to scaffold motifs by inpainting unconditionally trained diffusion models.
Demonstrate that scaffolds up to 80 residues can be generated and are consistent with AlphaFold2 predictions.

提出的方法

ProtDiff: an E(3)-equivariant graph neural network diffusion model over 3D protein backbones.
Fully connected graph representation with sequence-ordered nodes and sinusoidal positional encodings.
Noise prediction via epsilon_theta, implemented as an EGNN with translation/rotation equivariance.
SMCDiff: a sequential Monte Carlo method for exact conditional sampling from an unconditional diffusion model, enabling motif inpainting.
Theoretical guarantee: if the diffusion model matches the data, SMCDiff provides exact conditional samples in the large-compute limit.

实验结果

研究问题

RQ1Can a diffusion model learn a distribution over realistic 3D protein backbones extendable to longer scaffolds around a given motif?
RQ2Can SMCDiff produce diverse and accurate motif-scaffolding scaffolds conditioned on a motif, with provable correctness in the large-compute limit?
RQ3How well do generated backbones align with AlphaFold2-predicted structures and support designable motifs?
RQ4What are the trade-offs between scaffold length, designability, and computational cost for conditional motif scaffolding?

主要发现

ProtDiff can sample backbone structures up to 80 residues around a motif.
SMCDiff enables diverse scaffold generation around a fixed motif and attains motif RMSD below 1 Å for 80-residue scaffolds in at least the 5trv case.
Backbone samples achieve designable alignment with AlphaFold2 predictions, with scTM > 0.5 indicating designability in tested cases.
Unconditionally sampled backbones show diversity and some designability (11.8% with scTM > 0.5 across 50–128 residues), but left-handed helices are common (45% have a left-handed helix).
SMCDiff provides asymptotically exact conditional samples given an accurate diffusion model and sufficient compute (number of particles).
Conditional scaffolding with 64 particles takes about 2 minutes per sample, competitive with other inpainting approaches.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。