[论文解读] Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem
本文提出 ProtDiff,这是用于蛋白质骨架的三维扩散模型,以及 SMCDiff,一种用于条件采样的序贯蒙特卡洛方法,用于支架模体。它能生成多样且更长的支架(最多80个残基),并与 AlphaFold2 的预测保持一致。
Construction of a scaffold structure that supports a desired motif, conferring protein function, shows promise for the design of vaccines and enzymes. But a general solution to this motif-scaffolding problem remains open. Current machine-learning techniques for scaffold design are either limited to unrealistically small scaffolds (up to length 20) or struggle to produce multiple diverse scaffolds. We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. We develop SMCDiff to efficiently sample scaffolds from this distribution conditioned on a given motif; our algorithm is the first to theoretically guarantee conditional samples from a diffusion model in the large-compute limit. We evaluate our designed backbones by how well they align with AlphaFold2-predicted structures. We show that our method can (1) sample scaffolds up to 80 residues and (2) achieve structurally diverse scaffolds for a fixed motif.
研究动机与目标
- Motivate motif-scaffolding as a scalable, diverse scaffold design problem.
- Develop a 3D diffusion model (ProtDiff) for protein backbones that respects geometric invariances.
- Create a conditional sampling method (SMCDiff) to scaffold motifs by inpainting unconditionally trained diffusion models.
- Demonstrate that scaffolds up to 80 residues can be generated and are consistent with AlphaFold2 predictions.
提出的方法
- ProtDiff: an E(3)-equivariant graph neural network diffusion model over 3D protein backbones.
- Fully connected graph representation with sequence-ordered nodes and sinusoidal positional encodings.
- Noise prediction via epsilon_theta, implemented as an EGNN with translation/rotation equivariance.
- SMCDiff: a sequential Monte Carlo method for exact conditional sampling from an unconditional diffusion model, enabling motif inpainting.
- Theoretical guarantee: if the diffusion model matches the data, SMCDiff provides exact conditional samples in the large-compute limit.
实验结果
研究问题
- RQ1Can a diffusion model learn a distribution over realistic 3D protein backbones extendable to longer scaffolds around a given motif?
- RQ2Can SMCDiff produce diverse and accurate motif-scaffolding scaffolds conditioned on a motif, with provable correctness in the large-compute limit?
- RQ3How well do generated backbones align with AlphaFold2-predicted structures and support designable motifs?
- RQ4What are the trade-offs between scaffold length, designability, and computational cost for conditional motif scaffolding?
主要发现
- ProtDiff can sample backbone structures up to 80 residues around a motif.
- SMCDiff enables diverse scaffold generation around a fixed motif and attains motif RMSD below 1 Å for 80-residue scaffolds in at least the 5trv case.
- Backbone samples achieve designable alignment with AlphaFold2 predictions, with scTM > 0.5 indicating designability in tested cases.
- Unconditionally sampled backbones show diversity and some designability (11.8% with scTM > 0.5 across 50–128 residues), but left-handed helices are common (45% have a left-handed helix).
- SMCDiff provides asymptotically exact conditional samples given an accurate diffusion model and sufficient compute (number of particles).
- Conditional scaffolding with 64 particles takes about 2 minutes per sample, competitive with other inpainting approaches.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。