QUICK REVIEW

[論文レビュー] Deep transfer learning in the assessment of the quality of protein models

David Menéndez Hurtado, Karolis Uziela|arXiv (Cornell University)|Apr 17, 2018

Protein Structure and Dynamics被引用数 36

ひとこと要約

本論文は、配列に基づく予測から得られる最小限の構造的特徴を用いて、タンパク質モデルの品質評価のための深層転移学習フレームワークを提案する。事前学習された畳み込みニューラルネットワークと、比較的順位付けを符号化する三頭のアーキテクチャを活用することで、入力の複雑さを低減しながらも、粗い構造的入力を用いても、グローバルスコア予測およびターゲット順位付けにおいて、既存のモデルを上回る最先端の性能を達成した。

ABSTRACT

MOTIVATION: Proteins fold into complex structures that are crucial for their biological functions. Experimental determination of protein structures is costly and therefore limited to a small fraction of all known proteins. Hence, different computational structure prediction methods are necessary for the modelling of the vast majority of all proteins. In most structure prediction pipelines, the last step is to select the best available model and to estimate its accuracy. This model quality estimation problem has been growing in importance during the last decade, and progress is believed to be important for large scale modelling of proteins. The current generation of model quality estimation programs performs well at separating incorrect and good models, but fails to consistently identify the best possible model. State-of-the-art model quality assessment methods use a combination of features that describe a model and the agreement of the model with features predicted from the protein sequence. RESULTS: We first introduce a deep neural network architecture to predict model quality using significantly fewer input features than state-of-the-art methods. Thereafter, we propose a methodology to train the deep network that leverages the comparative structure of the problem. We also show the possibility of applying transfer learning on databases of known protein structures. We demonstrate its viability by reaching state-of-the-art performance using only a reduced set of input features and a coarse description of the models. AVAILABILITY: The code will be freely available for download at github.com/ElofssonLab/ProQ4.

研究の動機と目的

大規模な構造生物学パイプラインにおいて、複数の予測から最良のタンパク質モデルを選択する課題に対処すること。
3次元座標に依存しない、単一の配列予測特徴のみを入力として使用することで、複雑な構造的特徴への依存を低減すること。
転移学習と構造化された深層学習アーキテクチャを用いて、モデルの品質評価性能を向上させること。
サイドチェーンパッケージングや外部ツールに依存しない、スケーラブルで高速かつ頑健な品質推定を可能にすること。
共通の予測器によるバイアスを軽減するため、生の出力ではなく内部表現から学習すること。

提案手法

同じタンパク質の複数のモデルを比較する三頭の深層ニューラルネットワークアーキテクチャを設計し、相対的な品質順位付けを学習する。
既知のタンパク質構造の大規模データベースを用いて事前学習を行い、二次構造や疎水性表面積など、配列から導かれる入力からの一般構造的特徴を学習する。
関連するが異なるデータセット上で事前学習されたモデルの特徴を初期値として使用することで、転移学習を適用し、一般化性能を向上させる。
詳細な3次元座標を避けるために、二次構造、溶媒可及性、および残基の深さといった粗い構造的記述子のみを入力として使用する。
モデルのペアをネットワークに供給することで、比較学習を実装し、どちらのモデルが優れているかを予測する能力を学習させ、順位付けの正確性を向上させる。
グローバルスコア予測およびローカルスコア予測、さらにターゲット順位付けを最適化する損失関数を用いて、CASPs11データで微調整を行う。

Figure 1 : Detail of the 3D structure of the protein 3TDU. Highlighted in yellow are the residues that smoothly transition between helix and coil. Predictions are commonly wrong about the exact position of the boundary.

実験結果

リサーチクエスチョン

RQ1深層学習モデルは、配列由来の特徴のみを用いても、タンパク質モデルの品質評価で最先端の性能を達成できるか？
RQ2既知のタンパク質構造からの転移学習は、モデル品質予測性能をどのように向上させるか？
RQ3順位付けに基づく比較学習戦略は、標準的な回帰手法に比べて予測精度をどの程度向上させるか？
RQ4二次構造と溶媒可及性に限定された最小限の入力表現でも、依然として高い性能が得られるか？
RQ5外部ツールではなく内部表現から学習することで、共通の予測器によるバイアスを低減できるか？

主な発見

提案手法ProQ4は、CASPs11で最先端の性能を達成し、入力特徴の数を削減しているにもかかわらず、グローバルスコア予測およびターゲット順位付けで既存手法を上回った。
転移学習は畳み込みニューラルネットワークアーキテクチャにおいて性能を顕著に向上させたが、マルチレイヤーパーセプトロンでは事前学習が効果を示さず、場合によっては性能を低下させた。
三頭のアーキテクチャはモデルの順位付けを効果的に学習し、真のスコアと強く一致し、他のトップパフォーマンス手法と高い相関を示した。
この手法は、サイドチェーンパッケージングの変動に対して頑健であることが示された。これは、明示的な3次元座標に依存しない粗い構造的特徴に依存しているためである。
相関行列の結果から、ProQ4の予測は他の高性能手法と非常に一貫しており、信頼性が高く安定した性能を示していることがわかった。
最小限の入力でも高い性能を達成したため、深層学習が低次元の配列ベース特徴から意味のある品質信号を抽出できることを示している。

Figure 2 : The 1D ResNet module, the main building block of our convolutional nets

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。