QUICK REVIEW

[论文解读] DMPfold: fast de novo protein model generation from covarying sequences using predicted distances and iterative model building

Joe G. Greener, Shaun M. Kandathil|arXiv (Cornell University)|Nov 29, 2018

Machine Learning in Bioinformatics被引用 2

一句话总结

DMPfold 是一种基于深度学习的方法，能够从共变序列中预测残基-残基距离限制、主链氢键网络以及二面角，通过迭代方式构建高精度从头蛋白模型。该方法在 CASP12 结构域上的表现优于现有方法，并在仅使用中等规模计算集群的情况下，一周内为 25% 的此前未表征的 Pfam '暗家族' 和 16% 的人类 UniProt 条目生成了高置信度模型。

ABSTRACT

The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.

研究动机与目标

解决现有氨基酸共变性方法在浅序列比对的小蛋白家族建模中的局限性。
开发一种利用深度学习从序列数据预测原子间距离限制、氢键网络和二面角的方法。
为此前未表征的蛋白家族（包括跨膜蛋白及序列数少于 100 个的蛋白）提供高精度的从头蛋白结构建模能力。
为整个蛋白质组（包括人类蛋白质组和结构未知的 Pfam 结构域）的结构注释提供可扩展且高效的解决方案。

提出的方法

DMPfold 使用深度学习从多序列比对中预测残基-残基距离限制，即使在序列深度有限的情况下也能实现。
它预测主链氢键网络和二面角，以指导结构建模。
该方法采用迭代建模策略，利用预测的距离限制和几何约束逐步优化蛋白结构。
通过整合预测的距离限制和二级结构信息，引导构象采样并提升模型精度。
该框架设计计算高效，可在中等规模硬件集群上实现快速建模。
其端到端训练基于共变序列数据，以优化结构预测性能。

实验结果

研究问题

RQ1深度学习能否提升对具有浅序列比对的小蛋白家族的从头蛋白结构预测精度？
RQ2在缺乏已知模板的情况下，预测的距离限制和几何约束在迭代建模中的准确性如何？
RQ3DMPfold 在多大程度上能够为 Pfam 中此前未表征的蛋白家族（即‘暗家族’）和人类蛋白质组生成可靠模型？
RQ4与现有方法相比，DMPfold 在跨膜蛋白等具有挑战性的目标上的表现如何？
RQ5当输入序列少于 100 个时，DMPfold 是否能生成高置信度模型？

主要发现

在 CASP12 结构域的测试集中，DMPfold 的精度优于两种主流方法，展现出更优的模型质量。
该方法成功为 25% 的此前被归类为 '暗家族' 的 Pfam 结构域生成了高置信度模型，这些家族此前无已知结构。
它为 16% 的人类蛋白质组 UniProt 条目生成了准确模型，这些条目此前无实验测定的结构。
即使多序列比对中序列数少于 100 个，DMPfold 仍能保持高精度，显著扩展了共变性方法在小家族中的适用范围。
在 200 核心集群上，DMPfold 在一周内完成了所有 Pfam 暗家族的完整建模，展现出极高的计算效率。
DMPfold 在跨膜蛋白上表现良好，表明其在多种蛋白类型中均具有鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。