QUICK REVIEW

[论文解读] SE(3) diffusion model with application to protein backbone generation

Jason Yim, Brian L. Trippe|arXiv (Cornell University)|Feb 5, 2023

Protein Structure and Dynamics被引用 71

一句话总结

本文提出 FrameDiff，一种在多帧上对称性SE(3)不变的扩散模型，用于生成蛋白质骨架，不依赖预训练结构预测器，能够设计出长度达 500 个残基的单体。

ABSTRACT

The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.

研究动机与目标

为由多个帧描述的骨架（SE(3)^N）开发一个有原理的 SE(3) 不变扩散框架。
从理论上推导 SE(3)及其李群结构上的前向与反向扩散过程，以实现 DSM 训练。
将理论转化为一个实用的骨架生成模型（FrameDiff），配备 SE(3) 等变分数网络。
在不依赖预训练结构预测器的情况下，展示可设计的蛋白质单体，长度达到 500 个氨基酸。
在保持生成多样性的同时，展示与预训练基线相比具有竞争力的 in silico 设计性。

提出的方法

通过将旋转（SO(3)）和平移（R^3）解耦，并采用以中心为中心的 SE(3)^N 过程，来构建 SE(3) 上的前向扩散。
在紧致李群上推导 DSM 训练，给出 SO(3) 的显式布朗运动和热核表达式。
通过中心化的 SE(3)^N_0 形式实现 SE(3) 不变性，并将其投影到 SO(3) 等变网络。
引入 FrameDiff，一种用于骨架的 SE(3) 不变扩散模型，使用基于 AlphaFold2 风格结构模块且包含 IPA 和 Transformer 组件的神经分数网络。
同时预测帧更新和扭转角 psi，使用辅助的类重心骨架和局部距离损失来改进细粒度几何。

Figure 1: Method overview. (A) Backbone parameterization with frames. Each residue along the protein chain shares the same structure of backbone atoms due to the fixed bonds between each atom. Performing the GramSchmidt operation on vectors $v_{1},v_{2}$ results in rotation matrix $r$ that parameter

实验结果

研究问题

RQ1是否可以在多帧骨架上将 SE(3) 不变扩散形式化并训练，以建模蛋白质骨架？
RQ2如何将 DSM 适配到黎曼流形，特别是 SE(3) 和 SO(3)，以实现端到端的骨架生成？
RQ3将 SE(3)^N 过程中心化是否能够实现真正的 SE(3) 不变性并提升学习效率？
RQ4FrameDiff 是否能够在不依赖预训练结构预测器的情况下生成长度达 500 个氨基酸的可设计蛋白质单体？
RQ5FrameDiff 与预训练扩散基线在 in silico 设计性和样本多样性方面有何比较？

主要发现

FrameDiff 能生成可设计、具有多样性且新颖的蛋白质单体，长度可达 500。
FrameDiff 的 in silico 设计性具有竞争力，仅次于具有更多参数的预训练模型。
该理论通过中心化并使用 SE(3) 等变网络，在 SE(3)^N 上提供了有原理的 SE(3) 不变扩散。
该方法使得在多帧上学习 SE(3) 等变分数成为可能，而无需依赖对结构预测网络的预训练。
实验表明 FrameDiff 在保持合理骨架几何的同时，能泛化到已知蛋白质结构之外。

Figure 2: Single layer of $\mathrm{FrameDiff}$ . Each layer takes in the current node embedding $\mathbf{h}_{\ell}$ , edge embedding $\mathbf{z}_{\ell}$ , frames $\mathbf{T}_{\ell}$ , and initial node embedding $\mathbf{h}_{0}$ . Rectangles indicate trainable neural networks. Node embeddings are fir

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。